959 resultados para Genome sequencing
Resumo:
Since the advent of high-throughput DNA sequencing technologies, the ever-increasing rate at which genomes have been published has generated new challenges notably at the level of genome annotation. Even if gene predictors and annotation softwares are more and more efficient, the ultimate validation is still in the observation of predicted gene product( s). Mass-spectrometry based proteomics provides the necessary high throughput technology to show evidences of protein presence and, from the identified sequences, confirmation or invalidation of predicted annotations. We review here different strategies used to perform a MS-based proteogenomics experiment with a bottom-up approach. We start from the strengths and weaknesses of the different database construction strategies, based on different genomic information (whole genome, ORF, cDNA, EST or RNA-Seq data), which are then used for matching mass spectra to peptides and proteins. We also review the important points to be considered for a correct statistical assessment of the peptide identifications. Finally, we provide references for tools used to map and visualize the peptide identifications back to the original genomic information.
Resumo:
We describe an original case of disseminated infection with Histoplasma capsulatum (Hc) var. duboisii in an African patient with AIDS who migrated to Switzerland. The diagnosis of histoplasmosis was suggested using direct examination of tissues and confirmed in 24 h with a panfungal polymerase chain reaction assay. The variety duboisii of Hc was established using DNA sequencing of the polymorphic genomic region OLE. Molecular tools allow diagnosis of histoplasmosis in 24 h, which is drastically shorter than culture procedures.
Resumo:
Nowadays, genome-wide association studies (GWAS) and genomic selection (GS) methods which use genome-wide marker data for phenotype prediction are of much potential interest in plant breeding. However, to our knowledge, no studies have been performed yet on the predictive ability of these methods for structured traits when using training populations with high levels of genetic diversity. Such an example of a highly heterozygous, perennial species is grapevine. The present study compares the accuracy of models based on GWAS or GS alone, or in combination, for predicting simple or complex traits, linked or not with population structure. In order to explore the relevance of these methods in this context, we performed simulations using approx 90,000 SNPs on a population of 3,000 individuals structured into three groups and corresponding to published diversity grapevine data. To estimate the parameters of the prediction models, we defined four training populations of 1,000 individuals, corresponding to these three groups and a core collection. Finally, to estimate the accuracy of the models, we also simulated four breeding populations of 200 individuals. Although prediction accuracy was low when breeding populations were too distant from the training populations, high accuracy levels were obtained using the sole core-collection as training population. The highest prediction accuracy was obtained (up to 0.9) using the combined GWAS-GS model. We thus recommend using the combined prediction model and a core-collection as training population for grapevine breeding or for other important economic crops with the same characteristics.
Resumo:
BACKGROUND: Coronary artery calcification (CAC) detected by computed tomography is a noninvasive measure of coronary atherosclerosis, which underlies most cases of myocardial infarction (MI). We sought to identify common genetic variants associated with CAC and further investigate their associations with MI. METHODS AND RESULTS: Computed tomography was used to assess quantity of CAC. A meta-analysis of genome-wide association studies for CAC was performed in 9961 men and women from 5 independent community-based cohorts, with replication in 3 additional independent cohorts (n=6032). We examined the top single-nucleotide polymorphisms (SNPs) associated with CAC quantity for association with MI in multiple large genome-wide association studies of MI. Genome-wide significant associations with CAC for SNPs on chromosome 9p21 near CDKN2A and CDKN2B (top SNP: rs1333049; P=7.58×10(-19)) and 6p24 (top SNP: rs9349379, within the PHACTR1 gene; P=2.65×10(-11)) replicated for CAC and for MI. Additionally, there is evidence for concordance of SNP associations with both CAC and MI at a number of other loci, including 3q22 (MRAS gene), 13q34 (COL4A1/COL4A2 genes), and 1p13 (SORT1 gene). CONCLUSIONS: SNPs in the 9p21 and PHACTR1 gene loci were strongly associated with CAC and MI, and there are suggestive associations with both CAC and MI of SNPs in additional loci. Multiple genetic loci are associated with development of both underlying coronary atherosclerosis and clinical events.
Resumo:
Large, rare copy number variants (CNVs) have been implicated in a variety of psychiatric disorders, but the role of CNVs in recurrent depression is unclear. We performed a genome-wide analysis of large, rare CNVs in 3106 cases of recurrent depression, 459 controls screened for lifetime-absence of psychiatric disorder and 5619 unscreened controls from phase 2 of the Wellcome Trust Case Control Consortium (WTCCC2). We compared the frequency of cases with CNVs against the frequency observed in each control group, analysing CNVs over the whole genome, genic, intergenic, intronic and exonic regions. We found that deletion CNVs were associated with recurrent depression, whereas duplications were not. The effect was significant when comparing cases with WTCCC2 controls (P=7.7 × 10(-6), odds ratio (OR) =1.25 (95% confidence interval (CI) 1.13-1.37)) and to screened controls (P=5.6 × 10(-4), OR=1.52 (95% CI 1.20-1.93). Further analysis showed that CNVs deleting protein coding regions were largely responsible for the association. Within an analysis of regions previously implicated in schizophrenia, we found an overall enrichment of CNVs in our cases when compared with screened controls (P=0.019). We observe an ordered increase of samples with deletion CNVs, with the lowest proportion seen in screened controls, the next highest in unscreened controls and the highest in cases. This may suggest that the absence of deletion CNVs, especially in genes, is associated with resilience to recurrent depression.
Resumo:
We present a high-quality (>100× depth) Illumina genome sequence of the leaf-cutting ant Acromyrmex echinatior, a model species for symbiosis and reproductive conflict studies. We compare this genome with three previously sequenced genomes of ants from different subfamilies and focus our analyses on aspects of the genome likely to be associated with known evolutionary changes. The first is the specialized fungal diet of A. echinatior, where we find gene loss in the ant's arginine synthesis pathway, loss of detoxification genes, and expansion of a group of peptidase proteins. One of these is a unique ant-derived contribution to the fecal fluid, which otherwise consists of "garden manuring" fungal enzymes that are unaffected by ant digestion. The second is multiple mating of queens and ejaculate competition, which may be associated with a greatly expanded nardilysin-like peptidase gene family. The third is sex determination, where we could identify only a single homolog of the feminizer gene. As other ants and the honeybee have duplications of this gene, we hypothesize that this may partly explain the frequent production of diploid male larvae in A. echinatior. The fourth is the evolution of eusociality, where we find a highly conserved ant-specific profile of neuropeptide genes that may be related to caste determination. These first analyses of the A. echinatior genome indicate that considerable genetic changes are likely to have accompanied the transition from hunter-gathering to agricultural food production 50 million years ago, and the transition from single to multiple queen mating 10 million years ago.
Resumo:
It is generally accepted that the extent of phenotypic change between human and great apes is dissonant with the rate of molecular change. Between these two groups, proteins are virtually identical, cytogenetically there are few rearrangements that distinguish ape-human chromosomes, and rates of single-base-pair change and retrotransposon activity have slowed particularly within hominid lineages when compared to rodents or monkeys. Studies of gene family evolution indicate that gene loss and gain are enriched within the primate lineage. Here, we perform a systematic analysis of duplication content of four primate genomes (macaque, orang-utan, chimpanzee and human) in an effort to understand the pattern and rates of genomic duplication during hominid evolution. We find that the ancestral branch leading to human and African great apes shows the most significant increase in duplication activity both in terms of base pairs and in terms of events. This duplication acceleration within the ancestral species is significant when compared to lineage-specific rate estimates even after accounting for copy-number polymorphism and homoplasy. We discover striking examples of recurrent and independent gene-containing duplications within the gorilla and chimpanzee that are absent in the human lineage. Our results suggest that the evolutionary properties of copy-number mutation differ significantly from other forms of genetic mutation and, in contrast to the hominid slowdown of single-base-pair mutations, there has been a genomic burst of duplication activity at this period during human evolution.
Resumo:
Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. IMPORTANCE: This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV6 variant lineages and five sublineages were identified and showed some degree of association with geographical location, anatomical site of infection/disease, and/or gender. We additionally identified several HPV6 lineage- and sublineage-specific SNPs to facilitate the identification of HPV6 variants and determined a representative region within the L2 gene that is suitable for HPV6 whole-genome-based phylogenetic analysis. This study complements and significantly expands the current knowledge of HPV6 genetic diversity and forms a comprehensive basis for future epidemiological, evolutionary, functional, pathogenicity, vaccination, and molecular assay development studies.
Resumo:
Ultra-high-throughput sequencing (UHTS) techniques are evolving rapidly and may soon become an affordable and routine tool for sequencing plant DNA, even in smaller plant biology labs. Here we review recent insights into intraspecific genome variation gained from UHTS, which offers a glimpse of the rather unexpected levels of structural variability among Arabidopsis thaliana accessions. The challenges that will need to be addressed to efficiently assemble and exploit this information are also discussed.
Resumo:
The discovery of genes implicated in familial forms of Parkinson's disease (PD) has provided new insights into the molecular events leading to neurodegeneration. Clinically, patients with genetically determined PD can be difficult to distinguish from those with sporadic PD. Monogenic causes include autosomal dominantly (SNCA, LRRK2, VPS35, EIF4G1) as well as recessively (PARK2, PINK1, DJ-1) inherited mutations. Additional recessive forms of parkinsonism present with atypical signs, including very early disease onset, dystonia, dementia and pyramidal signs. New techniques in the search for phenotype-associated genes (next-generation sequencing, genome-wide association studies) have expanded the spectrum of both monogenic PD and variants that alter risk to develop PD. Examples of risk genes include the two lysosomal enzyme coding genes GBA and SMPD1, which are associated with a 5-fold and 9-fold increased risk of PD, respectively. It is hoped that further knowledge of the genetic makeup of PD will allow designing treatments that alter the course of the disease.
Resumo:
Autosomal Recessive Osteopetrosis is a genetic disorder characterized by increased bone density due to lack of resorption by the osteoclasts. Genetic studies have widely unraveled the molecular basis of the most severe forms, while cases of intermediate severity are more difficult to characterize, probably because of a large heterogeneity. Here, we describe the use of exome sequencing in the molecular diagnosis of 2 siblings initially thought to be affected by "intermediate osteopetrosis", which identified a homozygous mutation in the CTSK gene. Prompted by this finding, we tested by Sanger sequencing 25 additional patients addressed to us for recessive osteopetrosis and found CTSK mutations in 4 of them. In retrospect, their clinical and radiographic features were found to be compatible with, but not typical for, Pycnodysostosis. We sought to identify modifier genes that might have played a role in the clinical manifestation of the disease in these patients, but our results were not informative. In conclusion, we underline the difficulties of differential diagnosis in some patients whose clinical appearance does not fit the classical malignant or benign picture and recommend that CTSK gene be included in the molecular diagnosis of high bone density conditions.
Resumo:
In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10(-9)) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10(-4)-2.2 × 10(-7). Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in general.
Resumo:
ABSTRACT Pneumocystis jirovecii is a fungus that causes severe pneumonia in immunocompromised patients. However, its study is hindered by the lack of an in vitro culture method. We report here the genome of P. jirovecii that was obtained from a single bronchoalveolar lavage fluid specimen from a patient. The major challenge was the in silico sorting of the reads from a mixture representing the different organisms of the lung microbiome. This genome lacks virulence factors and most amino acid biosynthesis enzymes and presents reduced GC content and size. Together with epidemiological observations, these features suggest that P. jirovecii is an obligate parasite specialized in the colonization of human lungs, which causes disease only in immune-deficient individuals. This genome sequence will boost research on this deadly pathogen. IMPORTANCE Pneumocystis pneumonia is a major cause of mortality in patients with impaired immune systems. The availability of the P. jirovecii genome sequence allows new analyses to be performed which open avenues to solve critical issues for this deadly human disease. The most important ones are (i) identification of nutritional supplements for development of culture in vitro, which is still lacking 100 years after discovery of the pathogen; (ii) identification of new targets for development of new drugs, given the paucity of present treatments and emerging resistance; and (iii) identification of targets for development of vaccines.
Resumo:
MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.