12 resultados para SEQUENCING DATA
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting 'off-target' sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.
Resumo:
As pituitary function depends on the integrity of the hypothalamic-pituitary axis, any defect in the development and organogenesis of this gland may account for a form of combined pituitary hormone deficiency (CPHD). Although pit-1 was 1 of the first factors identified as a cause of CPHD in mice, many other homeodomain and transcription factors have been characterized as being involved in different developmental stages of pituitary gland development, such as prophet of pit-1 (prop-1), P-Lim, ETS-1, and Brn 4. The aims of the present study were first to screen families and patients suffering from different forms of CPHD for PROP1 gene alterations, and second to define possible hot spots and the frequency of the different gene alterations found. Of 73 subjects (36 families) analyzed, we found 35 patients, belonging to 18 unrelated families, with CPHD caused by a PROP1 gene defect. The PROP1 gene alterations included 3 missense mutations, 2 frameshift mutations, and 1 splice site mutation. The 2 reported frameshift mutations could be caused by any 2-bp GA or AG deletion at either the 148-GGA-GGG-153 or 295-CGA-GAG-AGT-303 position. As any combination of a GA or AG deletion yields the same sequencing data, the frameshift mutations were called 149delGA and 296delGA, respectively. All but 1 mutation were located in the PROP1 gene encoding the homeodomain. Importantly, 3 tandem repeats of the dinucleotides GA at location 296-302 in the PROP1 gene represent a hot spot for CPHD. In conclusion, the PROP1 gene seems to be a major candidate gene for CPHD; however, further studies are needed to evaluate other genetic defects involved in pituitary development.
Resumo:
BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.
Resumo:
OBJECTIVES: We investigated whether Acinetobacter baumannii isolates of veterinary origin shared common molecular characteristics with those described in humans. METHODS: Nineteen A. baumannii isolates collected in pets and horses were analysed. Clonality was studied using repetitive extragenic palindromic PCR (rep-PCR) and multilocus sequence typing (MLST). PCR and DNA sequencing for various beta-lactamase, aminoglycoside-modifying enzyme, gyrA and parC, ISAba1 and IS1133, adeR and adeS of the AdeABC efflux pump, carO porin and class 1/2/3 integron genes were performed. RESULTS: Two main clones [A (n = 8) and B (n = 9)] were observed by rep-PCR. MLST indicated that clone A contained isolates of sequence type (ST) ST12 (international clone II) and clone B contained isolates of ST15 (international clone I). Two isolates of ST10 and ST20 were also noted. Seventeen isolates were resistant to gentamicin, 12 to ciprofloxacin and 3 to carbapenems. Isolates of ST12 carried bla(OXA-66), bla(ADC-25), bla(TEM-1), aacC2 and IS1133. Strains of ST15 possessed bla(OXA-69), bla(ADC-11), bla(TEM-1) and a class 1 integron carrying aacC1 and aadA1. ISAba1 was found upstream of bla(ADC) (one ST10 and one ST12) and/or bla(OXA-66) (seven ST12). Twelve isolates of different STs contained the substitutions Ser83Leu in GyrA and Ser80Leu or Glu84Lys in ParC. Significant disruptions of CarO porin and overexpressed efflux pumps were not observed. The majority of infections were hospital acquired and in animals with predisposing conditions for infection. CONCLUSIONS: STs and the molecular background of resistance observed in our collection have been frequently described in A. baumannii detected in human patients. Animals should be considered as a potential reservoir of multidrug-resistant A. baumannii.
Resumo:
With the advent of high through-put sequencing (HTS), the emerging science of metagenomics is transforming our understanding of the relationships of microbial communities with their environments. While metagenomics aims to catalogue the genes present in a sample through assessing which genes are actively expressed, metatranscriptomics can provide a mechanistic understanding of community inter-relationships. To achieve these goals, several challenges need to be addressed from sample preparation to sequence processing, statistical analysis and functional annotation. Here we use an inbred non-obese diabetic (NOD) mouse model in which germ-free animals were colonized with a defined mixture of eight commensal bacteria, to explore methods of RNA extraction and to develop a pipeline for the generation and analysis of metatranscriptomic data. Applying the Illumina HTS platform, we sequenced 12 NOD cecal samples prepared using multiple RNA-extraction protocols. The absence of a complete set of reference genomes necessitated a peptide-based search strategy. Up to 16% of sequence reads could be matched to a known bacterial gene. Phylogenetic analysis of the mapped ORFs revealed a distribution consistent with ribosomal RNA, the majority from Bacteroides or Clostridium species. To place these HTS data within a systems context, we mapped the relative abundance of corresponding Escherichia coli homologs onto metabolic and protein-protein interaction networks. These maps identified bacterial processes with components that were well-represented in the datasets. In summary this study highlights the potential of exploiting the economy of HTS platforms for metatranscriptomics.
Resumo:
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
Resumo:
Next-generation sequencing (NGS) is a valuable tool for the detection and quantification of HIV-1 variants in vivo. However, these technologies require detailed characterization and control of artificially induced errors to be applicable for accurate haplotype reconstruction. To investigate the occurrence of substitutions, insertions, and deletions at the individual steps of RT-PCR and NGS, 454 pyrosequencing was performed on amplified and non-amplified HIV-1 genomes. Artificial recombination was explored by mixing five different HIV-1 clonal strains (5-virus-mix) and applying different RT-PCR conditions followed by 454 pyrosequencing. Error rates ranged from 0.04-0.66% and were similar in amplified and non-amplified samples. Discrepancies were observed between forward and reverse reads, indicating that most errors were introduced during the pyrosequencing step. Using the 5-virus-mix, non-optimized, standard RT-PCR conditions introduced artificial recombinants in a fraction of at least 30% of the reads that subsequently led to an underestimation of true haplotype frequencies. We minimized the fraction of recombinants down to 0.9-2.6% by optimized, artifact-reducing RT-PCR conditions. This approach enabled correct haplotype reconstruction and frequency estimations consistent with reference data obtained by single genome amplification. RT-PCR conditions are crucial for correct frequency estimation and analysis of haplotypes in heterogeneous virus populations. We developed an RT-PCR procedure to generate NGS data useful for reliable haplotype reconstruction and quantification.
Resumo:
BACKGROUND Whole genome sequencing (WGS) is increasingly used in molecular-epidemiological investigations of bacterial pathogens, despite cost- and time-intensive analyses. We combined strain-specific single nucleotide polymorphism (SNP)-typing and targeted WGS to investigate a tuberculosis cluster spanning 21 years in Bern, Switzerland. METHODS Based on genome sequences of three historical outbreak Mycobacterium tuberculosis isolates, we developed a strain-specific SNP-typing assay to identify further cases. We screened 1,642 patient isolates, and performed WGS on all identified cluster isolates. We extracted SNPs to construct genomic networks. Clinical and social data were retrospectively collected. RESULTS We identified 68 patients associated with the outbreak strain. Most were diagnosed in 1991-1995, but cases were observed until 2011. Two thirds belonged to the homeless and substance abuser milieu. Targeted WGS revealed 133 variable SNP positions among outbreak isolates. Genomic network analyses suggested a single origin of the outbreak, with subsequent division into three sub-clusters. Isolates from patients with confirmed epidemiological links differed by 0-11 SNPs. CONCLUSIONS Strain-specific SNP-genotyping allowed rapid and inexpensive identification of M. tuberculosis outbreak isolates in a population-based strain collection. Subsequent targeted WGS provided detailed insights into transmission dynamics. This combined approach could be applied to track bacterial pathogens in real-time and at high resolution.
Resumo:
Oligonucleotides comprising unnatural building blocks, which interfere with the translation machinery, have gained increased attention for the treatment of gene-related diseases (e.g. antisense, RNAi). Due to structural modifications, synthetic oligonucleotides exhibit increased biostability and bioavailability upon administration. Consequently, classical enzyme-based sequencing methods are not applicable to their sequence elucidation and verification. Tandem mass spectrometry is the method of choice for performing such tasks, since gas-phase dissociation is not restricted to natural nucleic acids. However, tandem mass spectrometric analysis can generate product ion spectra of tremendous complexity, as the number of possible fragments grows rapidly with increasing sequence length. The fact that structural modifications affect the dissociation pathways greatly increases the variety of analytically valuable fragment ions. The gas-phase dissociation of oligonucleotides is characterized by the cleavage of one of the four bonds along the phosphodiester chain, by the accompanying loss of nucleases, and by the generation of internal fragments due to secondary backbone cleavage. For example, an 18-mer oligonucleotide yields a total number of 272’920 theoretical fragment ions. In contrast to the processing of peptide product ion spectra, which nowadays is highly automated, there is a lack of tools assisting the interpretation of oligonucleotide data. The existing web-based and stand-alone software applications are primarily designed for the sequence analysis of natural nucleic acids, but do not account for chemical modifications and adducts. Consequently, we developed a software to support the interpretation of mass spectrometric data of natural and modified nucleic acids and their adducts with chemotherapeutic agents.
Resumo:
Background: Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. Results: The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. Conclusions: It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.