High-quality open source Kazakh speech corpus. The corpus contains about 554 hours of transcribed audio recordings, including 204250 utterances uttered by participants from different regions and age groups, as well as by both sexes. All audio files were recorded using mobile devices (iOS and Android). The corpus was selectively checked by native speakers of the Kazakh language to ensure high quality. The data set is primarily intended for use in training systems for automatic speech recognition. Technical characteristics of audio files: .wav format, 16 kB, 22 and 44 kHz. The founders of the corpus: Nurgali Kadyrbek(https://orcid.org/0000-0002-5461-8899), Madina Mansurova(https://orcid.org/0000-0002-9680-2758) To cite the dataset, please use the following BibTeX entry: @inproceedings{mansurova-kadyrbek-2023-kazakh-speech-dataset, title = "The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters", author = "Madina Mansurova and Nurgali Kadyrbek", booktitle = "Proceedings of the Big Data and Cognitive Computing", month = "July 20", year = "2023", pages = "5--9", url = "https://doi.org/10.3390/bdcc7030132" }