Uncertainty estimation for a speech recognition system

 

Đã lưu trong:
Chi tiết về thư mục
Nhiều tác giả: Morales-Muñoz, Walter, Calderón-Ramírez, Saúl
Định dạng: artículo original
Trạng thái:Versión publicada
Ngày xuất bản:2024
Miêu tả:Whisper is a voice recognition system designed by the company OpenAI, which has been trained with 680,000 hours of multilingual and multitask supervised data collected from the web. The following research aims to adapt and employ the Monte Carlo Dropout using audio data labeled in Spanish and contaminated with a certain amount of noise and Levensthein distance to estimate the score uncertainty of this system.Preliminary results show that there is a linear relationship between uncertainty estimation and the Word Error Rate (WER) of the transcriptions. Furthermore, it is observed that the number of insertions or omissions in the transcriptions tends to be low.
Quốc gia:Portal de Revistas TEC
Tổ chức giáo dục:Instituto Tecnológico de Costa Rica
Repositorio:Portal de Revistas TEC
Ngôn ngữ:Español
OAI Identifier:oai:ojs.pkp.sfu.ca:article/7305
Truy cập trực tuyến:https://revistas.tec.ac.cr/index.php/tec_marcha/article/view/7305
Từ khóa:Uncertainty
Speech Recognition
ASR
Whisper
Monte Carlo Dropout
Incertidumbre
Reconocimiento de voz