953 resultados para Genome Annotation Assessment
Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca)
Resumo:
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Resumo:
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
Resumo:
Background: Giardia are a group of widespread intestinal protozoan parasites in a number of vertebrates. Much evidence from G. lamblia indicated they might be the most primitive extant eukaryotes. When and how such a group of the earliest branching unicellular eukaryotes developed the ability to successfully parasitize the latest branching higher eukaryotes (vertebrates) is an intriguing question. Gene duplication has long been thought to be the most common mechanism in the production of primary resources for the origin of evolutionary novelties. In order to parse the evolutionary trajectory of Giardia parasitic lifestyle, here we carried out a genome-wide analysis about gene duplication patterns in G. lamblia. Results: Although genomic comparison showed that in G. lamblia the contents of many fundamental biologic pathways are simplified and the whole genome is very compact, in our study 40% of its genes were identified as duplicated genes. Evolutionary distance analyses of these duplicated genes indicated two rounds of large scale duplication events had occurred in G. lamblia genome. Functional annotation of them further showed that the majority of recent duplicated genes are VSPs (Variant-specific Surface Proteins), which are essential for the successful parasitic life of Giardia in hosts. Based on evolutionary comparison with their hosts, it was found that the rapid expansion of VSPs in G. lamblia is consistent with the evolutionary radiation of placental mammals. Conclusions: Based on the genome-wide analysis of duplicated genes in G. lamblia, we found that gene duplication was essential for the origin and evolution of Giardia parasitic lifestyle. The recent expansion of VSPs uniquely occurring in G. lamblia is consistent with the increment of its hosts. Therefore we proposed a hypothesis that the increment of Giradia hosts might be the driving force for the rapid expansion of VSPs.
Resumo:
Full-length and partial genome sequences of four members of the genus Aquareovirus, family Reoviridae (Golden shiner reovirus, Grass carp reovirus, Striped bass reovirus and golden ide reovirus) were characterized. Based on sequence comparison, the unclassified Grass carp reovirus was shown to be a member of the species Aquareovirus C The status of golden ide reovirus, another unclassified aquareovirus, was also examined. Sequence analysis showed that it did not belong to the species Aquareovirus A or C, but assessment of its relationship to the species Aquareovirus B, D, E and F was hampered by the absence of genetic data from these species. In agreement with previous reports of ultrastructural resemblance between aquareoviruses and orthoreoviruses, genetic analysis revealed homology in the genes of the two groups. This homology concerned eight of the 11 segments of the aquareovirus genome (amino acid identity 17-42%), and similar genetic organization was observed in two other segments. The conserved terminal sequences in the genomes of members of the two groups were also similar. These data are undoubtedly an indication of the common evolutionary origin of these viruses. This clear genetic relatedness between members of distinct genera is unique within the family Reoviridae. Such a genetic relationship is usually observed between members of a single genus. However, the current taxonomic classification of aquareoviruses and orthoreoviruses in two different genera is supported by a number of characteristics, including their distinct G+C contents, unequal numbers of genome segments, absence of an antigenic relationship, different cytopathic effects and specific econiches.
Resumo:
The overall aims of this study were to investigate the differences between raw/farm milk and pasteurised milk with respect to potential immune modifying effects following consumption and investigate the bacterial composition of raw milk compared to pasteurised milk. Furthermore, in this thesis, panels of potential probiotic bacteria from the Bifidobacterium and Lactobacillus genera were investigated. The overall bacterial composition of raw milk was compared with pasteurised milk using samples obtained from commercial milk producers around Ireland using next generation sequencing technology (454 pyrosequencing). Here the presence of previously unrecognised and diverse bacterial populations in unpasteurised cow’s milk was identified. Futhermore the bacterial content of pasteurised milk was found to be more diverse than previously thought. The global response of the adenocarcinoma cell line HT-29 to raw milk and pasteurised milk exposures were also characterised using whole genome microarray technology. Over one thousand differentially expressed genes were identified which were found to be involved in a plethora of cellular functions. Interestingly a reduction in immune related activity (e.g. Major histocompatability complex class II signalling and T and B cell proliferation) was identified in cells exposed to pasteurised milk compared with raw milk exposures. Further studies comparing human cell response to raw versus pasteurised milk was performed using peripheral blood mononuclear cells (PBMC) from healthy donors. A reduction in CD14 was identified following raw milk exposures compared with pasteurised milk and the pattern of cytokine production may indicate that gram positive bacteria in the raw milk were contributing to the differences in the cellular response to raw versus pasteurised milk. Panels of potentially probiotic bacteria (comprising of lactobacilli and bifidobacteria) were further assessed for immunomodulatory capabilities using cell culture based models. Gene expression and cytokine production were used to evaluate stimulated and unstimulated (LPS) cellular responses as well as interaction mechanisms
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Resumo:
A preclinical safety study was conducted to evaluate the short- and long-term toxicity of a recombinant adeno-associated virus serotype 8 (AAV2/8) vector that has been developed as an immune-modulatory adjunctive therapy to recombinant human acid α-glucosidase (rhGAA, Myozyme) enzyme replacement treatment (ERT) for patients with Pompe disease (AAV2/8-LSPhGAApA). The AAV2/8-LSPhGAApA vector at 1.6 × 10(13) vector particles/kg, after intravenous injection, did not cause significant short- or long-term toxicity. Recruitment of CD4(+) (but not CD8(+)) lymphocytes to the liver was elevated in the vector-dosed male animals at study day (SD) 15, and in group 8 animals at SD 113, in comparison to their respective control animals. Administration of the vector, either prior to or after the one ERT injection, uniformly prevented the hypersensitivity induced by subsequent ERT in males, but not always in female animals. The vector genome was sustained in all tissues through 16-week postdosing, except for in blood with a similar tissue tropism between males and females. Administration of the vector alone, or combined with the ERT, was effective in producing significantly increased GAA activity and consequently decreased glycogen accumulation in multiple tissues, and the urine biomarker, Glc4, was significantly reduced. The efficacy of the vector (or with ERT) was better in males than in females, as demonstrated both by the number of tissues showing significantly effective responses and the extent of response in a given tissue. Given the lack of toxicity for AAV2/8LSPhGAApA, further consideration of clinical translation is warranted in Pompe disease.
Resumo:
Burkholderia cenocepacia are opportunistic Gram-negative bacteria that can cause chronic pulmonary infections in patients with cystic fibrosis. These bacteria demonstrate a high-level of intrinsic antibiotic resistance to most clinically useful antibiotics complicating treatment. We previously identified 14 genes encoding putative Resistance-Nodulation-Cell Division (RND) efflux pumps in the genome of B. cenocepacia J2315, but the contribution of these pumps to the intrinsic drug resistance of this bacterium remains unclear.
Resumo:
In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.
Resumo:
Empirically derived phenotypic measurements have the potential to enhance gene-finding efforts in schizophrenia. Previous research based on factor analyses of symptoms has typically included schizoaffective cases. Deriving factor loadings from analysis of only narrowly defined schizophrenia cases could yield more sensitive factor scores for gene pathway and gene ontology analyses. Using an Irish family sample, this study 1) factor analyzed clinician-rated Operational Criteria Checklist items in cases with schizophrenia only, 2) scored the full sample based on these factor loadings, and 3) implemented genome-wide association, gene-based, and gene-pathway analysis of these SCZ-based symptom factors (final N= 507). Three factors emerged from the analysis of the schizophrenia cases: a manic, a depressive, and a positive symptom factor. In gene-based analyses of these factors, multiple genes had q<. 0.01. Of particular interest are findings for PTPRG and WBP1L, both of which were previously implicated by the Psychiatric Genomics Consortium study of SCZ; results from this study suggest that variants in these genes might also act as modifiers of SCZ symptoms. Gene pathway analyses of the first factor indicated over-representation of glutamatergic transmission, GABA-A receptor, and cyclic GMP pathways. Results suggest that these pathways may have differential influence on affective symptom presentation in schizophrenia.
Resumo:
To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritization. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges.
Resumo:
The work presented in this thesis describes the functional characterization of hydrogenases in the overall energy metabolism of the sulfate reducing bacterium Desulfovibrio gigas. With the complete annotation of the D. gigas genome, we were able to verify that only the two previously described hydrogenases are present in this organism, the periplasmic [NiFe] HynAB and the cytoplasmic membrane-bound [NiFe] Ech.(...)
Resumo:
The European Mouse Mutagenesis Consortium is the European initiative contributing to the international effort on functional annotation of the mouse genome. Its objectives are to establish and integrate mutagenesis platforms, gene expression resources, phenotyping units, storage and distribution centers and bioinformatics resources. The combined efforts will accelerate our understanding of gene function and of human health and disease.
Resumo:
La bio-informatique est un champ pluridisciplinaire qui utilise la biologie, l’informatique, la physique et les mathématiques pour résoudre des problèmes posés par la biologie. L’une des thématiques de la bio-informatique est l’analyse des séquences génomiques et la prédiction de gènes d’ARN non codants. Les ARN non codants sont des molécules d’ARN qui sont transcrites mais pas traduites en protéine et qui ont une fonction dans la cellule. Trouver des gènes d’ARN non codants par des techniques de biochimie et de biologie moléculaire est assez difficile et relativement coûteux. Ainsi, la prédiction des gènes d’ARNnc par des méthodes bio-informatiques est un enjeu important. Cette recherche décrit un travail d’analyse informatique pour chercher des nouveaux ARNnc chez le pathogène Candida albicans et d’une validation expérimentale. Nous avons utilisé comme stratégie une analyse informatique combinant plusieurs logiciels d’identification d’ARNnc. Nous avons validé un sous-ensemble des prédictions informatiques avec une expérience de puces à ADN couvrant 1979 régions du génome. Grace à cette expérience nous avons identifié 62 nouveaux transcrits chez Candida albicans. Ce travail aussi permit le développement d’une méthode d’analyse pour des puces à ADN de type tiling array. Ce travail présente également une tentation d’améliorer de la prédiction d’ARNnc avec une méthode se basant sur la recherche de motifs d’ARN dans les séquences.
Resumo:
Les habitudes de consommation de substances psychoactives, le stress, l’obésité et les traits cardiovasculaires associés seraient en partie reliés aux mêmes facteurs génétiques. Afin d’explorer cette hypothèse, nous avons effectué, chez 119 familles multi-générationnelles québécoises de la région du Saguenay-Lac-St-Jean, des études d’association et de liaison pangénomiques pour les composantes génétiques : de la consommation usuelle d’alcool, de tabac et de café, de la réponse au stress physique et psychologique, des traits anthropométriques reliés à l’obésité, ainsi que des mesures du rythme cardiaque (RC) et de la pression artérielle (PA). 58000 SNPs et 437 marqueurs microsatellites ont été utilisés et l’annotation fonctionnelle des gènes candidats identifiés a ensuite été réalisée. Nous avons détecté des corrélations phénotypiques significatives entre les substances psychoactives, le stress, l’obésité et les traits hémodynamiques. Par exemple, les consommateurs d’alcool et de tabac ont montré un RC significativement diminué en réponse au stress psychologique. De plus, les consommateurs de tabac avaient des PA plus basses que les non-consommateurs. Aussi, les hypertendus présentaient des RC et PA systoliques accrus en réponse au stress psychologique et un indice de masse corporelle (IMC) élevé, comparativement aux normotendus. D’autre part, l’utilisation de tabac augmenterait les taux corporels d’épinéphrine, et des niveaux élevés d’épinéphrine ont été associés à des IMC diminués. Ainsi, en accord avec les corrélations inter-phénotypiques, nous avons identifié plusieurs gènes associés/liés à la consommation de substances psychoactives, à la réponse au stress physique et psychologique, aux traits reliés à l’obésité et aux traits hémodynamiques incluant CAMK4, CNTN4, DLG2, DAG1, FHIT, GRID2, ITPR2, NOVA1, NRG3 et PRKCE. Ces gènes codent pour des protéines constituant un réseau d’interactions, impliquées dans la plasticité synaptique, et hautement exprimées dans le cerveau et ses tissus associés. De plus, l’analyse des sentiers de signalisation pour les gènes identifiés (P = 0,03) a révélé une induction de mécanismes de Potentialisation à Long Terme. Les variations des traits étudiés seraient en grande partie liées au sexe et au statut d’hypertension. Pour la consommation de tabac, nous avons noté que le degré et le sens des corrélations avec l’obésité, les traits hémodynamiques et le stress sont spécifiques au sexe et à la pression artérielle. Par exemple, si des variations ont été détectées entre les hommes fumeurs et non-fumeurs (anciens et jamais), aucune différence n’a été observée chez les femmes. Nous avons aussi identifié de nombreux traits reliés à l’obésité dont la corrélation avec la consommation de tabac apparaît essentiellement plus liée à des facteurs génétiques qu’au fait de fumer en lui-même. Pour le sexe et l’hypertension, des différences dans l’héritabilité de nombreux traits ont également été observées. En effet, des analyses génétiques sur des sous-groupes spécifiques ont révélé des gènes additionnels partageant des fonctions synaptiques : CAMK4, CNTN5, DNM3, KCNAB1 (spécifique à l’hypertension), CNTN4, DNM3, FHIT, ITPR1 and NRXN3 (spécifique au sexe). Ces gènes codent pour des protéines interagissant avec les protéines de gènes détectés dans l’analyse générale. De plus, pour les gènes des sous-groupes, les résultats des analyses des sentiers de signalisation et des profils d’expression des gènes ont montré des caractéristiques similaires à celles de l’analyse générale. La convergence substantielle entre les déterminants génétiques des substances psychoactives, du stress, de l’obésité et des traits hémodynamiques soutiennent la notion selon laquelle les variations génétiques des voies de plasticité synaptique constitueraient une interface commune avec les différences génétiques liées au sexe et à l’hypertension. Nous pensons, également, que la plasticité synaptique interviendrait dans de nombreux phénotypes complexes influencés par le mode de vie. En définitive, ces résultats indiquent que des approches basées sur des sous-groupes et des réseaux amélioreraient la compréhension de la nature polygénique des phénotypes complexes, et des processus moléculaires communs qui les définissent.