Supplementary information : weighted tree kernels for sequence analysis


Autoria(s): Bowles, Christopher J.; Hogan, James M.
Data(s)

23/04/2014

Resumo

Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/67877/

Publicador

Springer Verlag

Relação

http://eprints.qut.edu.au/67877/1/BowlesHoganSupp.pdf

Bowles, Christopher J. & Hogan, James M. (2014) Supplementary information : weighted tree kernels for sequence analysis. Proceedings of ESANN 2014. (In Press)

Direitos

Copyright 2014 Please consult the authors

Fonte

School of Electrical Engineering & Computer Science; Science & Engineering Faculty

Palavras-Chave #060102 Bioinformatics #080100 ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING #080301 Bioinformatics Software #Bioinformatics #Kernel methods #Machine learning #Genomics
Tipo

Other