name: LibriSpeech ASR corpus summary: Large-scale (1000 hours) corpus of read English speech category: speech license: CC BY 4.0 file: dev-clean.tar.gz development set, "clean" speech file: dev-other.tar.gz development set, "other", more challenging, speech file: test-clean.tar.gz test set, "clean" speech file: test-other.tar.gz test set, "other" speech file: train-clean-100.tar.gz training set of 100 hours "clean" speech file: train-clean-360.tar.gz training set of 360 hours "clean" speech file: train-other-500.tar.gz training set of 500 hours "other" speech file: intro-disclaimers.tar.gz extracted LibriVox announcements for some of the speakers file: original-mp3.tar.gz LibriVox mp3 files, from which corpus' audio was extracted file: original-books.tar.gz Project Gutenberg texts, against which the audio in the corpus was aligned file: raw-metadata.tar.gz Some extra meta-data produced during the creation of the corpus file: md5sum.txt MD5 checksums for the archive files