Veracruz Orizaba Nahuatl Endangered Language
Summary: Audio corpus of Orizaba (Veracruz) Nahuatl speech (Glottocode: oriz1235; ISO 639-3: nlv)
License: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Downloads (use a mirror closer to you):
info.pdf [61K] (Document with information about this corpus ) Mirrors: [US] [EU] [CN]
Tequila.zip [39G] (Speech data of Veracruz Orizaba Nahuatl, recorded in 48kHz, 16-bit ) Mirrors: [US] [EU] [CN]
Veracruz-Orizaba-Nahuat_Collaborators.txt [5.4K] (List of all native speaker collaborators for this corpus ) Mirrors: [US] [EU] [CN]
Veracruz-Orizaba-Nahuatl_File-list.txt [69K] (List of all filenames with duration ) Mirrors: [US] [EU] [CN]
Plant-observations_Veracruz.csv [6.1K] (List of all plant observations with observation number, family, scientific name, date collected, name of person who identified the plant ) Mirrors: [US] [EU] [CN]
Plant-Labels_Tequila-Orizaba-ethnobotanical-field-trip_2023-10-22.pdf [307K] (Labels for the 81 plant observations the audio of which is included in this corpus ) Mirrors: [US] [EU] [CN]
About this resource:
Fieldwork was coordinated locally by Gabriela Citlahua Zepahua, who also participated as a speaker in some of the recordings. Citlahua Zepahua was responsible for contacting the native speakers who generously participated in this research.
Please note that this initial OpenSLR deposit focuses on the audio corpus. Five future enhancements to the metadata for this corpus are envisioned at this present time: (1) Completed metadata, particularly a description of the content of each recording; (2) 10 hours of transcription by hand in ELAN, material that will provide the initial basis for transfer ASR; (3) A final deposit of the results of ASR transcriptions; (4) Corrections to the ASR transcriptions by Amith and native speakers of Orizaba Nahuatl; (5) Reference to the ASR end2end recipe (GitHub) used to generate the ASR transcriptions.
The fieldwork for developing this corpus was supported by NSF Dynamic Language Infrastructure grant #2123578 entitled “Collaborative Research: Improving Techniques of Automatic Speech Recognition and Transfer Learning using Documentary Linguistic Corpora” (Jonathan D. Amith, PI). The speech processing facet of this research (Award #2123624) will be carried out by Shinji Watanabe (PI) and his team at Carnegie Mellon University.
All material is made available under the Creative Common license CC BY-SA (Attribution-ShareAlike). Please cite or use any material as follows (Corresponding author is Jonathan D. Amith firstname.lastname@example.org).
Amith, Jonathan D., Amelia Domínguez Alcántara, Ceferino Salgado Castañeda, Gabriela Citlahua Zepahua, Mariano Gorostiza Salazar, and Miriam Jiménez Chimil, 2022–23, Audio corpus of Orizaba (Veracruz) Nahuatl speech (Glottocode: oriz1235; ISO 639-3: nlv). Accessed [date] at https://www.openslr.org/147