Consensus σ70 promoter prediction using hadoop
Data(s) |
2013
|
---|---|
Resumo |
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella. |
Identificador | |
Publicador |
IEEE |
Relação |
DOI:10.1109/eScience.2013.42 Hogan, James M., Kelly, Wayne A., & Newell, Felicity S. (2013) Consensus σ70 promoter prediction using hadoop. In Proceedings of the 2013 IEEE 9th International Conference on e-Science, IEEE, 22 - 25 October 2013, pp. 35-44. |
Direitos |
Copyright 2013 by The Institute of Electrical and Electronics Engineers, Inc. |
Fonte |
School of Electrical Engineering & Computer Science; Science & Engineering Faculty |
Palavras-Chave | #Biology computing #Data handling #Genomics #Parellel programming #Proteins #Public domain software |
Tipo |
Conference Paper |