A feature selection approach for identification of signature genes from SAGE data
Contribuinte(s) |
UNIVERSIDADE DE SÃO PAULO |
---|---|
Data(s) |
26/08/2013
26/08/2013
01/05/2007
|
Resumo |
Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers. The authors are grateful to Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (proc. 300722/98-2, 52.1097/01-0 and 468413/00-6), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (proc. 05/00587-5, 01/09401-0 and 04/03967-0) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial help. This work was partially supported by grant 1 D43 TW07015-01 from the National Institutes of Health, USA. We are grateful to the reviewers, who greatly helped us improving the paper. Finally, we specially thank Ricardo Z. N. Vêncio by the explanation and considerations about the credibility intervals concept. The authors are grateful to Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (proc. 300722/982, 52.1097/010 and 468413/006), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (proc. 05/005875, 01/094010 and 04/039670) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial help. This work was partially supported by grant 1 D43 TW0701501 from the National Institutes of Health, USA. We are grateful to the reviewers, who greatly helped us improving the paper. Finally, we specially thank Ricardo Z. N. Vêncio by the explanation and considerations about the credibility intervals concept. |
Identificador |
BMC Bioinformatics. 2007 May 22;8(1):169 1471-2105 http://www.producao.usp.br/handle/BDPI/32735 http://dx.doi.org/10.1186/1471-2105-8-169 10.1186/1471-2105-8-169 |
Idioma(s) |
eng |
Relação |
BMC Bioinformatics |
Direitos |
openAccess Barrera et al; licensee BioMed Central Ltd. - This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Tipo |
article original article |