Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)

Large Sundanese ASR training data set

Identifier: SLR36

Summary: Sundanese ASR training data set containing ~220K utterances.

Category: Speech

License: Attribution-ShareAlike 4.0 International

asr_sundanese.sha256 [1.3K]   (Checksum for the files )
LICENSE [20K]   (License information for the data set )
utt_spk_text.tsv [15M]   (All utterances in the data set ) [1.4G]   (Data set, file 0/15 ) [1.4G]   (Data set, file 1/15 ) [1.4G]   (Data set, file 2/15 ) [1.4G]   (Data set, file 3/15 ) [1.4G]   (Data set, file 4/15 ) [1.4G]   (Data set, file 5/15 ) [1.4G]   (Data set, file 6/15 ) [1.4G]   (Data set, file 7/15 ) [1.4G]   (Data set, file 8/15 ) [1.4G]   (Data set, file 9/15 ) [1.4G]   (Data set, file 10/15 ) [1.4G]   (Data set, file 11/15 ) [1.4G]   (Data set, file 12/15 ) [1.4G]   (Data set, file 13/15 ) [1.4G]   (Data set, file 14/15 ) [1.4G]   (Data set, file 15/15 )

About this resource:

This data set contains transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file.

The data set has been manually quality checked, but there might still be errors.

This dataset was collected by Google in Indonesia.

See LICENSE.txt file for license information.

Copyright 2016, 2017 Google, Inc.