Simcluster: clustering enumeration gene expression data on the simplex space


Autoria(s): Vêncio, Ricardo ZN; Varuzza, Leonardo ; Pereira, Carlos A de B; Brentani, Helena ; Shmulevich, Ilya 
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

26/08/2013

26/08/2013

01/07/2007

Resumo

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

We thank Dr. Jared Roach (ISB) and Dr João C. Barata (USP) for constructive discussions and Dr. Alistair Rust (ISB) for help with the web server. LV is supported by CAPES. CABP is partially supported by CNPq. This work is partially supported by NIH/NIAID grants U19AI057266 and U54AI54253 and NIH/NIGMS P50GMO76547.

We thank Dr. Jared Roach (ISB) and Dr João C. Barata (USP) for constructive discussions and Dr. Alistair Rust (ISB) for help with the web server. LV is supported by CAPES. CABP is partially supported by CNPq. This work is partially supported by NIH/NIAID grants U19-AI057266 and U54-AI54253 and NIH/NIGMS P50-GMO-76547.

Identificador

1471-2105

http://www.producao.usp.br/handle/BDPI/32736

10.1186/1471-2105-8-246

http://www.biomedcentral.com/1471-2105/8/246

Idioma(s)

eng

Relação

BMC Bioinformatics

Direitos

openAccess

Vêncio et al; licensee BioMed Central Ltd. - This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Tipo

article

original article