Biblioteca Digital

Probabilistic base calling of Solexa sequencing data.

**Autoria(s):** Rougemont J.; Amzallag A.; Iseli C.; Farinelli L.; Xenarios I.; Naef F.
Data(s)	2008
Resumo	BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Identificador	https://serval.unil.ch/?id=serval:BIB_54200D66E5BC isbn:1471-2105 (Electronic) pmid:18851737 doi:10.1186/1471-2105-9-431 isiid:000260490200001 http://my.unil.ch/serval/document/BIB_54200D66E5BC.pdf http://nbn-resolving.org/urn/resolver.pl?urn=urn:nbn:ch:serval-BIB_54200D66E5BC8
Idioma(s)	en
Direitos	info:eu-repo/semantics/openAccess
Fonte	BMC Bioinformatics, vol. 9, pp. 431-
Palavras-Chave	#Bacteriophage phi X 174/genetics; Base Sequence/genetics; Chromosome Mapping/methods; Cluster Analysis; DNA, Viral/analysis; Expressed Sequence Tags; Pattern Recognition, Automated/methods; Quality Control; Sequence Analysis, DNA/methods; Software; Spectrometry, Fluorescence/methods
Tipo	info:eu-repo/semantics/article article

Acesso ao item digital