New Contig Creation Algorithm for the de novo DNA Assembly Problem


Autoria(s): Goodarzi, Mohammad
Contribuinte(s)

Department of Computer Science

Data(s)

25/02/2014

25/02/2014

25/02/2014

Resumo

DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.

Identificador

http://hdl.handle.net/10464/5235

Idioma(s)

eng

Publicador

Brock University

Palavras-Chave #DNA Assembly Problem #de-novo #Contig Creation Algorithm #Bioinformatics
Tipo

Electronic Thesis or Dissertation