Large Sundanese ASR training data set
Identifier: SLR36
Summary: Sundanese ASR training data set containing ~220K utterances.
Category: Speech
License: Attribution-ShareAlike 4.0 International
 Downloads (use a mirror closer to you): 
 asr_sundanese.sha256  [1.3K]   (Checksum for the files
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 LICENSE  [20K]   (License information for the data set
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 utt_spk_text.tsv  [15M]   (All utterances in the data set
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_0.zip  [1.4G]   (Data set, file 0/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_1.zip  [1.4G]   (Data set, file 1/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_2.zip  [1.4G]   (Data set, file 2/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_3.zip  [1.4G]   (Data set, file 3/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_4.zip  [1.4G]   (Data set, file 4/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_5.zip  [1.4G]   (Data set, file 5/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_6.zip  [1.4G]   (Data set, file 6/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_7.zip  [1.4G]   (Data set, file 7/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_8.zip  [1.4G]   (Data set, file 8/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_9.zip  [1.4G]   (Data set, file 9/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_a.zip  [1.4G]   (Data set, file 10/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_b.zip  [1.4G]   (Data set, file 11/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_c.zip  [1.4G]   (Data set, file 12/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_d.zip  [1.4G]   (Data set, file 13/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_e.zip  [1.4G]   (Data set, file 14/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
 asr_sundanese_f.zip  [1.4G]   (Data set, file 15/15
)    Mirrors: 
 [EU]   
 [EU]   
 [CN]   
About this resource:
The data set has been manually quality checked, but there might still be errors.
This dataset was collected by Google in Indonesia.
See LICENSE.txt file for license information.
Copyright 2016, 2017 Google, Inc.
If you use this data in publications, please cite it as follows:
  @inproceedings{kjartansson-etal-sltu2018,
    title = {{Crowd-Sourced Speech Corpora for Javanese, Sundanese,  Sinhala, Nepali, and Bangladeshi Bengali}},
    author = {Oddur Kjartansson and Supheakmungkol Sarin and Knot Pipatsrisawat and Martin Jansche and Linne Ha},
    booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
    year  = {2018},
    address = {Gurugram, India},
    month = aug,
    pages = {52--55},
    URL   = {http://dx.doi.org/10.21437/SLTU.2018-11},
  }