Feature Evaluation Metrics for Population Genomic Data

I. Kavakiotis, A. Triantafyllidis, G. Tsoumakas, I. Vlahavas, “Feature Evaluation Metrics for Population Genomic Data”, Proceedings of 8th Hellenic Conference on Artificial Intelligence (SETN 2014)., A. Likas, K. Blekas and D. Kalles (Eds.), Springer, Artificial intelligence: Methods and Applications, LNCS, 8445, pp. 436-441, Ioannina, Greece, 2014

Machine learning, Bioinformatics, SNPs, Single nucleotide polymorphism, Feature selection.

Single Nucleotide Polymorphisms (SNPs) are considered nowadays one of the most important class of genetic markers with a wide range of applications with both scientific and economic interests. Although the advance of biotechnology has made feasible the production of genome wide SNP datasets, the cost of the production is still high. The transformation of the initial dataset into a smaller one with the same genetic information is a crucial task and it is performed through feature selection. Biologists evaluate features using methods originating from the field of population genetics. Although several studies have been performed in order to compare the existing biological methods, there is a lack of comparison between methods originating from the biology field with others originating from the machine learning. In this study we present some early results which support that biological methods perform slightly better than machine learning methods.