This dataset comprises text and speech data in Nepali, featuring both female and male voices. The dataset includes .wav files and two separate .tsv files for male and female audio. Each .tsv file contains audio_id and corresponding sentences, aligning with the audio filenames. The dataset underwent manual quality checks, although the possibility of errors remains. It was recorded to facilitate Nepali Text-to-Speech Synthesis research during the fine-tuning phase.

You can cite the data using the following BibTeX entry:

@inproceedings{khadka2023tts,
    title={Nepali Text-to-Speech Synthesis using Tacotron2 for Melspectrogram Generation},
    author={Khadka, Supriya and G.C., Ranju and Paudel, Prabin and Shah, Rahul and Joshi, Basanta},
    booktitle={SIGUL 2023, 2nd Annual Meeting of the Special Interest Group on Under-resourced Languages: a Satellite Workshop of Interspeech 2023},
    year={2023}
    }