Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts

 

Guardado en:
Detalles Bibliográficos
Autores: Sánchez Cárdenas, Roberto, Coto-Jiménez, Marvin
Formato: artículo original
Fecha de Publicación:2022
Descripción:Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Speaker diarization can be performed in a supervised or unsupervised way and is commonly applied in audios consisting of pure speech. In this work, a first annotated dataset and analysis of speaker diarization for Costa Rican radio broadcasting is performed, using two approaches: a classic one based on k-means clustering, and the more recent Fischer Semi Discriminant. We chose publicly available radio broadcast and decided to compare those systems’ applicability in the complete audio files, which also contains some segments of music and challenging acoustic conditions. Results show a dependency on the results according to the number of speakers in each broadcast, especially in the average cluster purity. The results also show the necessity of further exploration and combining with other classification and segmentation algorithms to better extract useful information from the dataset and allow further development of speech corpus.
País:RepositorioTEC
Institución:Instituto Tecnológico de Costa Rica
Repositorio:RepositorioTEC
Lenguaje:Inglés
OAI Identifier:oai:repositoriotec.tec.ac.cr:2238/14156
Acceso en línea:https://revistas.tec.ac.cr/index.php/tec_marcha/article/view/6464
https://hdl.handle.net/2238/14156
Palabra clave:Broadcasting
clustering
speaker diarization
speech technologies
Radiodifusión
agrupación
registro de locutores
tecnologías del habla