G. Tzanis, C. Berberidis, I. Vlahavas, “A Novel Data Mining Approach for the Accurate Prediction of Translation Initiation Sites”, 7th International Symposium on Biological and Medical Data Analysis, Nicos Maglaveras et al. (Ed.), Springer-Verlag, pp. 92-103, Thessaloniki, Greece, 2006.
7th International Symposium on Biological and Medical Data Analysis, Nicos Maglaveras et al. (Ed.), Springer-Verlag, pp. 92-103, Thessaloniki, Greece, 2006.
In an mRNA sequence, the prediction of the exact codon where the process of translation starts (Translation Initiation Site – TIS) is a particularly important problem. So far it has been tackled by several researchers that apply various statistical and machine learning techniques, achieving high accuracy levels, often over 90%. In this paper we propose a mahine learning approach that can further improve the prediction accuracy. First, we provide a concise review of the literature in this field. Then we propose a novel feature set. We perform extensive experiments on a publicly available, real world dataset for various vertebrate organisms using a variety of novel features and classification setups. We evaluate our results and compare them with a reference study and show that our approach that involves new features and a combination of the Ribosome Scanning Model with a meta-classifier shows higher accuracy in most cases.