Pre-processing for noise detection in gene expression classification data


Autoria(s): LIBRALON, Giampaolo Luiz; CARVALHO, André Carlos Ponce de Leon Ferreira de; LORENA, Ana Carolina
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

26/03/2012

26/03/2012

2009

Resumo

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

São Paulo State Research Foundation (FAPESP)

CNPq

Identificador

Journal of the Brazilian Computer Society, v.15, n.1, p.3-11, 2009

0104-6500

http://producao.usp.br/handle/BDPI/11824

10.1590/S0104-65002009000100002

http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-65002009000100002

http://www.scielo.br/pdf/jbcos/v15n1/v15n1a02.pdf

Idioma(s)

eng

Publicador

Sociedade Brasileira de Computação

Relação

Journal of the Brazilian Computer Society

Direitos

openAccess

Copyright Sociedade Brasileira de Computação

Palavras-Chave #Noise detection #Machine learning #Distance-based techniques #Gene expression analysis
Tipo

article

original article

publishedVersion