Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)


Identifier: SLR7

Summary: English speech recognition training corpus from TED talks, created by Laboratoire d’Informatique de l’Université du Maine (LIUM) (mirrored here)

Category: Speech

License: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives).

Download: TEDLIUM_release1.tar.gz [21G]   The first release

About this resource:

The TED-LIUM corpus (mirrored here) is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.

The original page requests that you cite the following paper if you make use of this corpus:

A. Rousseau, P. Deléglise, and Y. Estève, "TED-LIUM: an automatic speech recognition dedicated corpus",
in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), May 2012.

External URL:   Original source