Open Speech and Language Resources



Contact
dpovey@gmail.com
Phone: 425 247 4129
(Daniel Povey)

ALFFA (African Languages in the Field: speech Fundamentals and Automation)

Identifier: SLR25

Summary: Amharic, Swahili and Wolof data, mirrored from the ALFFA git repository

Category: Speech

License: MIT

Downloads:
data_readspeech_am.tar.bz2 [1.0G]   ( Amharic speech and transcripts )
data_broadcastnews_sw.tar.bz2 [1.2G]   ( Swahili speech and transcripts )
data_readspeech_wo.tar.bz2 [1.7G]   ( Wolof speech and transcripts )

About this resource:

This data is transcribed speech data, in Amharic and Swahili and Wolof.

This repository is a result of the ALFFA project http://alffa.imag.fr
We distribute READY-to-use (or READY-to-train) KALDI ASR systems and (when possible) associated corpora....
A summary of these resources and ASR performances, as well as a description of the ALFFA project has been published in the following paper:

Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof. 
Elodie Gauthier, Laurent Besacier, Sylvie Voisin, Michael Melese and Uriel Pascal Elingui. 
To appear at LREC 2016.
So far, the ASR directory contains Kaldi recipes for 4 languages : Amharic, Swahili, Hausa and Wolof.

  • AMHARIC
  • In ASR/AMHARIC/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish work using theses resources:
    @article {tachbelie2014, 
    	Author = {Martha Tachbelie and Solomon Teferra Abate and Laurent Besacier}, 
    	Date-Added = {2015-04-14 08:08:31 +0000}, 
    	Date-Modified = {2015-04-14 10:56:28 +0000}, 
    	Journal = {Speech Communication}, 
    	Publisher = {Elsevier}, 
    	Title = {Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic}, 
    	Volume = {56}, 
    	Year = {2014}
    }
    
  • SWAHILI
  • (this swahili ASR system is now available by default in the KALDI trunk when you install Kaldi on your machines)
    In ASR/SWAHILI/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish work using theses resources:
    @InProceedings {gelas:hal-00954048, 
    	author = {Gelas, Hadrien and Besacier, Laurent and Pellegrino, Francois}, 
    	title = {{D}evelopments of {S}wahili resources for an automatic speech recognition system}, 
    	booktitle = {{SLTU} - {W}orkshop on {S}poken {L}anguage {T}echnologies for {U}nder-{R}esourced {L}anguages}, 
    	year = {2012}, 
    	address = {Cape-Town, Afrique Du Sud}, 
    	abstract = {no abstract}, 
    	x-international-audience = {yes}, 
    	url = {http://hal.inria.fr/hal-00954048}
    }
    
  • WOLOF
  • In ASR/WOLOF/ you will find kaldi recipes + ressources - see README file for more details and ASR performance results that you should be able to reproduce - please cite this paper if you publish work using theses resources:
    @article {gauthier2016collect, 
    	Author = {Gauthier, Elodie and Besacier, Laurent and Voisin, Sylvie and Melese, Michael and Elingui, Uriel Pascal}, 
    	Journal = {LREC}, 
    	Title = {Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof}, 
    	Year = {2016} 
    }
    

External URLs:
https://github.com/besacier/ALFFA_PUBLIC/tree/master/ASR/AMHARIC   (Amharic data )
https://github.com/besacier/ALFFA_PUBLIC/tree/master/ASR/SWAHILI   (Swahili data )
https://github.com/besacier/ALFFA_PUBLIC/tree/master/ASR/WOLOF   (Wolof data)