Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)

High quality TTS data for four South African languages (af, st, tn, xh)

Identifier: SLR32

Summary: Multi-speaker TTS data for four South African languages, Afrikaans, Sesotho, Setswana and isiXhosa.

Category: Speech

License: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

af_za.tar.gz [950M]   (Audio files and transcriptions for Afrikaans )
st_za.tar.gz [724M]   (Audio files and transcriptions for Sesotho )
tn_za.tar.gz [729M]   (Audio files and transcriptions for Setswana )
xh_za.tar.gz [907M]   (Audio files and transcriptions for isiXhosa )

About this resource:

This data set contains multi-speaker high quality transcribed audio data for four languages of South Africa. The data set consists of wave files, and a TSV file transcribing the audio. In each folder, the file line_index.tsv contains a FileID, which in turn contains the UserID and the Transcription of audio in the file.

The data set has had some quality checks, but there might still be errors.

This data set was collected by as a collaboration between North West University and Google

See LICENSE.txt file for license information.

Copyright 2017 Google, Inc.