867 resultados para second-generation sequencing
Resumo:
With the advent of cheaper and faster DNA sequencing technologies, assembly methods have greatly changed. Instead of outputting reads that are thousands of base pairs long, new sequencers parallelize the task by producing read lengths between 35 and 400 base pairs. Reconstructing an organism’s genome from these millions of reads is a computationally expensive task. Our algorithm solves this problem by organizing and indexing the reads using n-grams, which are short, fixed-length DNA sequences of length n. These n-grams are used to efficiently locate putative read joins, thereby eliminating the need to perform an exhaustive search over all possible read pairs. Our goal was develop a novel n-gram method for the assembly of genomes from next-generation sequencers. Specifically, a probabilistic, iterative approach was utilized to determine the most likely reads to join through development of a new metric that models the probability of any two arbitrary reads being joined together. Tests were run using simulated short read data based on randomly created genomes ranging in lengths from 10,000 to 100,000 nucleotides with 16 to 20x coverage. We were able to successfully re-assemble entire genomes up to 100,000 nucleotides in length.
Resumo:
Introduction: Amplicon deep-sequencing using second-generation sequencing technology is an innovative molecular diagnostic technique and enables a highly-sensitive detection of mutations. As an international consortium we had investigated previously the robustness, precision, and reproducibility of 454 amplicon next-generation sequencing (NGS) across 10 laboratories from 8 countries (Leukemia, 2011;25:1840-8).
Aims: In Phase II of the study, we established distinct working groups for various hematological malignancies, i.e. acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), chronic lymphocytic leukemia (CLL), chronic myelogenous leukemia (CML), myelodysplastic syndromes (MDS), myeloproliferative neoplasms (MPN), and multiple myeloma. Currently, 27 laboratories from 13 countries are part of this research consortium. In total, 74 gene targets were selected by the working groups and amplicons were developed for a NGS deep-sequencing assay (454 Life Sciences, Branford, CT). A data analysis pipeline was developed to standardize mutation interpretation both for accessing raw data (Roche Amplicon Variant Analyzer, 454 Life Sciences) and variant interpretation (Sequence Pilot, JSI Medical Systems, Kippenheim, Germany).
Results: We will report on the design, standardization, quality control aspects, landscape of mutations, as well as the prognostic and predictive utility of this assay in a cohort of 8,867 cases. Overall, 1,146 primer sequences were designed and tested. In detail, for example in AML, 924 cases had been screened for CEBPA mutations. RUNX1 mutations were analyzed in 1,888 cases applying the deep-sequencing read counts to study the stability of such mutations at relapse and their utility as a biomarker to detect residual disease. Analyses of DNMT3A (n=1,041) were focused to perform landscape investigations and to address the prognostic relevance. Additionally, this working group is focusing on TET2, ASXL1, and TP53 analyses. A novel prognostic model is being developed allowing stratification of AML into prognostic subgroups based on molecular markers only. In ALL, 1,124 pediatric and adult cases have been screened, including 763 assays for TP53 mutations both at diagnosis and relapse of ALL. Pediatric and adult leukemia expert labs developed additional content to study the mutation incidence of other B and T lineage markers such as IKZF1, JAK2, IL7R, PAX5, EP300, LEF1, CRLF2, PHF6, WT1, JAK1, PTEN, AKT1, IL7R, NOTCH1, CREBBP, or FBXW7. Further, the molecular landscape of CLL is changing rapidly. As such, a separate working group focused on analyses including NOTCH1, SF3B1, MYD88, XPO1, FBXW7 and BIRC3. Currently, 922 cases were screened to investigate the range of mutational burden of NOTCH1 mutations for their prognostic relevance. In MDS, RUNX1 mutation analyses were performed in 977 cases. The prognostic relevance of TP53 mutations in MDS was assessed in additional 327 cases, including isolated deletions of chromosome 5q. Next, content was developed targeting genes of the cellular splicing component, e.g. SF3B1, SRSF2, U2AF1, and ZRSR2. In BCR-ABL1-negative MPN, nine genes of interest (JAK2, MPL, TET2, CBL, KRAS, EZH2, IDH1, IDH2, ASXL1) have been analyzed in a cohort of 155 primary myelofibrosis cases searching for novel somatic mutations and addressing their relevance for disease progression and leukemia transformation. Moreover, an assay was developed and applied to CMML cases allowing the simultaneous analysis of 25 leukemia-associated target genes in a single sequencing run using just 20 ng of starting DNA. Finally, nine laboratories are studying CML, applying ultra-deep sequencing of the BCR-ABL1 tyrosine kinase domain. Analyses were performed on 615 cases investigating the dynamics of expansion of mutated clones under various tyrosine kinase inhibitor therapies.
Conclusion: Molecular characterization of hematological malignancies today requires high diagnostic sensitivity and specificity. As part of the IRON-II study, a network of laboratories analyzed a variety of disease entities applying amplicon-based NGS assays. Importantly, the consortium not only standardized assay design for disease-specific panels, but also achieved consensus on a common data analysis pipeline for mutation interpretation. Distinct working groups have been forged to address scientific tasks and in total 8,867 cases had been analyzed thus far.
Resumo:
Khaya senegalensis (African mahogany or dry-zone mahogany) is a high-value hardwood timber species with great potential for forest plantations in northern Australia. The species is distributed across the sub-Saharan belt from Senegal to Sudan and Uganda. Because of heavy exploitation and constraints on natural regeneration and sustainable planting, it is now classified as a vulnerable species. Here, we describe the development of microsatellite markers for K. senegalensis using next-generation sequencing to assess its intra-specific diversity across its natural range, which is a key for successful breeding programs and effective conservation management of the species. Next-generation sequencing yielded 93943 sequences with an average read length of 234bp. The assembled sequences contained 1030 simple sequence repeats, with primers designed for 522 microsatellite loci. Twenty-one microsatellite loci were tested with 11 showing reliable amplification and polymorphism in K. senegalensis. The 11 novel microsatellites, together with one previously published, were used to assess 73 accessions belonging to the Australian K. senegalensis domestication program, sampled from across the natural range of the species. STRUCTURE analysis shows two major clusters, one comprising mainly accessions from west Africa (Senegal to Benin) and the second based in the far eastern limits of the range in Sudan and Uganda. Higher levels of genetic diversity were found in material from western Africa. This suggests that new seed collections from this region may yield more diverse genotypes than those originating from Sudan and Uganda in eastern Africa.
Resumo:
BACKGROUND: Several approaches can be used to determine the order of loci on chromosomes and hence develop maps of the genome. However, all mapping approaches are prone to errors either arising from technical deficiencies or lack of statistical support to distinguish between alternative orders of loci. The accuracy of the genome maps could be improved, in principle, if information from different sources was combined to produce integrated maps. The publicly available bovine genomic sequence assembly with 6x coverage (Btau_2.0) is based on whole genome shotgun sequence data and limited mapping data however, it is recognised that this assembly is a draft that contains errors. Correcting the sequence assembly requires extensive additional mapping information to improve the reliability of the ordering of sequence scaffolds on chromosomes. The radiation hybrid (RH) map described here has been contributed to the international sequencing project to aid this process. RESULTS: An RH map for the 30 bovine chromosomes is presented. The map was built using the Roslin 3000-rad RH panel (BovGen RH map) and contains 3966 markers including 2473 new loci in addition to 262 amplified fragment-length polymorphisms (AFLP) and 1231 markers previously published with the first generation RH map. Sequences of the mapped loci were aligned with published bovine genome maps to identify inconsistencies. In addition to differences in the order of loci, several cases were observed where the chromosomal assignment of loci differed between maps. All the chromosome maps were aligned with the current 6x bovine assembly (Btau_2.0) and 2898 loci were unambiguously located in the bovine sequence. The order of loci on the RH map for BTA 5, 7, 16, 22, 25 and 29 differed substantially from the assembled bovine sequence. From the 2898 loci unambiguously identified in the bovine sequence assembly, 131 mapped to different chromosomes in the BovGen RH map. CONCLUSION: Alignment of the BovGen RH map with other published RH and genetic maps showed higher consistency in marker order and chromosome assignment than with the current 6x sequence assembly. This suggests that the bovine sequence assembly could be significantly improved by incorporating additional independent mapping information.
Resumo:
Background Human papillomavirus (HPV) is the aetiological agent for cervical cancer and genital warts. Concurrent HPV and HIV infection in the South African population is high. HIV positive (+) women are often infected with multiple, rare and undetermined HPV types. Data on HPV incidence and genotype distribution are based on commercial HPV detection kits, but these kits may not detect all HPV types in HIV + women. The objectives of this study were to (i) identify the HPV types not detected by commercial genotyping kits present in a cervical specimen from an HIV positive South African woman using next generation sequencing, and (ii) determine if these types were prevalent in a cohort of HIV-infected South African women. Methods Total DNA was isolated from 109 cervical specimens from South African HIV + women. A specimen within this cohort representing a complex multiple HPV infection, with 12 HPV genotypes detected by the Roche Linear Array HPV genotyping (LA) kit, was selected for next generation sequencing analysis. All HPV types present in this cervical specimen were identified by Illumina sequencing of the extracted DNA following rolling circle amplification. The prevalence of the HPV types identified by sequencing, but not included in the Roche LA, was then determined in the 109 HIV positive South African women by type-specific PCR. Results Illumina sequencing identified a total of 16 HPV genotypes in the selected specimen, with four genotypes (HPV-30, 74, 86 and 90) not included in the commercial kit. The prevalence's of HPV-30, 74, 86 and 90 in 109 HIV positive South African women were found to be 14.6 %, 12.8 %, 4.6 % and 8.3 % respectively. Conclusions Our results indicate that there are HPV types, with substantial prevalence, in HIV positive women not being detected in molecular epidemiology studies using commercial kits. The significance of these types in relation to cervical disease remains to be investigated.
Resumo:
Rationale Although the advent of atypical, second-generation antipsychotics (SGAs) has resulted in reduced likelihood of akathisia, this adverse effect remains a problem. Extrapyramidal adverse effects are associated with increased drug occupancy of dopamine 2 receptors (DRD2). The A1 allele of the DRD2/ANKK1,rs1800497, is associated with decreased striatal DRD2 density. Objectives The aim of this study was to identify whether the A1(T) allele of the DRD2/ANKK1 was associated with akathisia (measured with the Barnes Akathisia Rating Scale) in a clinical sample of 234 patients treated with antipsychotics. Results Definite akathisia (a score≥ 2 for the global clinical assessment of akathisia) was significantly less common in subjects prescribed SGAs (16.8 %) than those prescribed FGAs (47.6%), p<0.0001. Overall, 24.1% of A1+ (A1A2/A1A1) patients treated with SGAs had akathisia compared to 10.8% of A1- (A2A2) patients. A1+ (A1A2/A1A1) patients administered SGAs also had higher global clinical assessment of akathisia scores than A1- subjects (p=0.01). SGAs maintained their advantage over FGAs regarding akathisia even in A1+ patients treated with SGAs. Conclusions These results strongly suggest that A1+ variants of the DRD2/ANKK1 Taq1A allele confer risk for akathisia in patients treated with SGAs and may explain inconsistencies across prior studies comparing FGAs and SGAs.
Resumo:
In a previous study, we demonstrated that mouse adult F(1) offspring, exposed to a vitamin d deficiency during pregnancy, developed a less severe and delayed Experimental Autoimmune Encephalomyelitis (EAE), when compared with control offspring. We then wondered whether a similar response was observed in the subsequent generation. To answer this question, we assessed F(2) females whose F(1) parents (males or females) were vitamin d-deprived when developing in the uterus of F(0) females. Unexpectedly, we observed that the vitamin d deficiency affecting the F(0) pregnant mice induced a precocious and more severe EAE in the F(2) generation. This paradoxical finding led us to assess its implications for the epidemiology of Multiple Sclerosis (MS) in humans. Using the REFGENSEP database for MS trios (the patient and his/her parents), we collected the parents' dates of birth and assessed a potential season of birth effect that could potentially be indicative of the vitamin d status of the pregnant grandmothers. A trend for a reduced number of births in the Fall for the parents of MS patients was observed but statistical significance was not reached. Further well powered studies are warranted to validate the latter finding.
Resumo:
This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
The Beauty Leaf tree (Calophyllum inophyllum) is a potential source of non-edible vegetable oil for producing future generation biodiesel because of its ability to grow in a wide range of climate conditions, easy cultivation, high fruit production rate, and the high oil content in the seed. This plant naturally occurs in the coastal areas of Queensland and the Northern Territory in Australia, and is also widespread in south-east Asia, India and Sri Lanka. Although Beauty Leaf is traditionally used as a source of timber and orientation plant, its potential as a source of second generation biodiesel is yet to be exploited. In this study, the extraction process from the Beauty Leaf oil seed has been optimised in terms of seed preparation, moisture content and oil extraction methods. The two methods that have been considered to extract oil from the seed kernel are mechanical oil extraction using an electric powered screw press, and chemical oil extraction using n-hexane as an oil solvent. The study found that seed preparation has a significant impact on oil yields, especially in the screw press extraction method. Kernels prepared to 15% moisture content provided the highest oil yields for both extraction methods. Mechanical extraction using the screw press can produce oil from correctly prepared product at a low cost, however overall this method is ineffective with relatively low oil yields. Chemical extraction was found to be a very effective method for oil extraction for its consistence performance and high oil yield, but cost of production was relatively higher due to the high cost of solvent. However, a solvent recycle system can be implemented to reduce the production cost of Beauty Leaf biodiesel. The findings of this study are expected to serve as the basis from which industrial scale biodiesel production from Beauty Leaf can be made.
Resumo:
Taka ‘i fonua mahu is a Tongan proverb, which means: "Going about or living in a fruitful land". This thesis analyses the experiences and impacts on migration on being Tongan, particularly Tongan youth in an adopted fruitful land, South East Queensland. The thesis argues that being Tongan in Tonga, has new meaning in the diaspora because of remittances, job prospects, educational opportunity, adapting to a multicultural society, and social justice. These issues are revealed by comparisons made with the experiences of the first generation Tongan migrants, and second generation Tongan migrants, as well as those in New Zealand and America. It argues that the Church, the family and kainga (extended family) impact on the anga fakatonga (Tongan way) and the essence of community as experienced by the first and second generation Tongan migrants. The framework for this analysis is a study of transnationalism, and being Tongan as it is maintained and changed in the diaspora.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
We isolated and characterized 21 microsatellite loci in the vulnerable and iconic Australian lungfish, Neoceratodus forsteri. Loci were screened across eight individuals from the Burnett River and 40 individuals from the Pine River. Genetic diversity was low with between one and six alleles per locus within populations and a maximum expected heterozygosity of 0.774. These loci will now be available to assess effective population sizes and genetic structure in N. forsteri across its natural range in South East Queensland, Australia.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.