Improving Balanced Accuracy for Minority Plant Species under Data Imbalance

 

Guardat en:
Dades bibliogràfiques
Autors: Gonzalez-Villanueva, Ruben, Carranza-Rojas, Jose
Format: artículo original
Estat:Versión publicada
Data de publicació:2024
Descripció:Regardless of the widely known success of deep learning in classification, such models are commonly measured by metrics that do not account for data imbalance, especially in terms of predictions per class, ignoring minority classes. This can be a problem, as minority classes are often the most difficult to predict and collect data for. In the plant domain, for example, species with fewer samples are often the ones that are hardest to collect and predict in the field. As we continue to identify more and more plant species, more of them become minority species, making it increasingly difficult to accurately classify them using traditional machine learning methods. To address this issue, we explore the combination of traditional data and machine learning approaches with deep learning techniques such as self-supervision in a preprocessing stage. By using self-supervised training together with different sampling algorithms and class weights, we were able to improve the balanced accuracy metric for minority plant species by between 7.9% and 13% without affecting general accuracy. This shows that using deep learning techniques in combination with traditional machine learning methods can help to improve the accuracy of predictions for minority classes, even in domains where data is limited.
Pais:Portal de Revistas TEC
Institution:Instituto Tecnológico de Costa Rica
Repositorio:Portal de Revistas TEC
Idioma:Inglés
OAI Identifier:oai:ojs.pkp.sfu.ca:article/7293
Accés en línia:https://revistas.tec.ac.cr/index.php/tec_marcha/article/view/7293
Paraula clau:Imbalanced datasets
long-tail distribution
automatic plant identification
balanced metrics
deep learning
minority classes
classification.
Conjuntos de datos desbalanceados
distribución de cola larga
identificación automática de plantas
aprendizaje profundo
clases minoritarias
clasificación.