Multi-dimensional scaling applied to histogram-based DNA analysis


Autoria(s): Costa, António Cardoso; Machado, J. A. Tenreiro; Quelhas, Dulce
Data(s)

22/04/2013

22/04/2013

2012

Resumo

This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.

Identificador

http://dx.doi.org/10.1155/2012/289694

1531-6912

http://hdl.handle.net/10400.22/1452

Idioma(s)

eng

Publicador

Hindawi Publishing Corporation

Relação

Comparative and Functional Genomics; Vol. 2012

http://www.hindawi.com/journals/ijg/2012/289694/

Direitos

openAccess

Palavras-Chave #MDS #Histogram #Analysis #DNA
Tipo

article