MediaSpeech is a dataset of French, Arabic, Turkish and Spanish media speech built with the purpose of testing Automated Speech Recognition (ASR) systems performance. The dataset contains 10 hours of speech for each language provided.

The dataset consists of short speech segments automatically extracted from media videos available on YouTube and manually transcribed, with some pre- and post-processing.

Baseline models and wav version of the dataset can be found in the following git repository: https://github.com/NTRLab/MediaSpeech

To cite the dataset, please use the following BibTeX entry:

@misc{mediaspeech2021,
      title={MediaSpeech: Multilanguage ASR Benchmark and Dataset}, 
      author={Rostislav Kolobov and Olga Okhapkina and Olga Omelchishina, Andrey Platunov and Roman Bedyakin and Vyacheslav Moshkin and Dmitry Menshikov and Nikolay Mikhaylovskiy},
      year={2021},
      eprint={2103.16193},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}