Assessment of methods for predicting eukaryotic promoter sequences

 

保存先:
書誌詳細
著者: Jiménez Oviedo, Byron, Arroyo Hernández, Jorge, Solano-González, Stefany
フォーマット: artículo
出版日付:2023
その他の書誌記述:Identifying promoters is challenging due to their short sequences, low conservation, and complex regulation. Historically, this was done through slow and expensive experimental methods. Efficient pattern recognition and statistical approaches have revolutionized this process, offering a faster and more cost-effective solution. Accurate promoter identification is vital for experimental biologists and biotech applications, enabling precise gene expression regulation. This document evaluates traditional machine learning methods (SVM, MLP, LDA, PSFN and denseNet) for promoter recognition, confirming their suitability. Our methodology consisted of using a partition of 80 to 20 percent for data training and analysis, respectively. The former was used to optimize the parameters at a k-fold with k = 10 by cross-validation; after optimization, the parameters were used to analyze the 20 percent of data. The F1 metric score is a positive predictive value that calculates the precision and sensitivity. Therefore, our findings align with [2] as the F1 metric score was above 85% in SVM and both PSFN methods, affirming them as the most reliable options for promoter prediction. The objective was to find a method that accurately validates the prediction of promoters and non-promoters by a comparison of methods. To achieve this, we incorporated sequences from humans, which are validated with these characteristics, to apply this workflow to sequences from other organisms in the future.
国:Repositorio UNA
機関:Universidad Nacional de Costa Rica
Repositorio:Repositorio UNA
言語:Inglés
OAI Identifier:oai:null:11056/27199
オンライン・アクセス:http://hdl.handle.net/11056/27199
キーワード:TATA
EUCARYOTE
PREDICTIVE METHODS
MACHINE LEARNING
PROMOTER IDENTIFICATION
ORGANISMOS