26 resultados para whole genome sequencing
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas.RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla.CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.
Resumo:
Colorectal cancer (CRC) is the third most common cancer and the fourth leading cause of cancer death worldwide. About 85% of the cases of CRC are known to have chromosomal instability, an allelic imbalance at several chromosomal loci, and chromosome amplification and translocation. The aim of this study is to determine the recurrent copy number variant (CNV) regions present in stage II of CRC through whole exome sequencing, a rapidly developing targeted next-generation sequencing (NGS) technology that provides an accurate alternative approach for accessing genomic variations. 42 normal-tumor paired samples were sequenced by Illumina Genome Analyzer. Data was analyzed with Varscan2 and segmentation was performed with R package R-GADA. Summary of the segments across all samples was performed and the result was overlapped with DEG data of the same samples from a previous study in the group1. Major and more recurrent segments of CNV were: gain of chromosome 7pq(13%), 13q(31%) and 20q(75%) and loss of 8p(25%), 17p(23%), and 18pq(27%). This results are coincident with the known literature of CNV in CRC or other cancers, but our methodology should be validated by array comparative genomic hybridisation (aCGH) profiling, which is currently the gold standard for genetic diagnosis of CNV.
Resumo:
With the advent of High performance computing, it is now possible to achieve orders of magnitude performance and computation e ciency gains over conventional computer architectures. This thesis explores the potential of using high performance computing to accelerate whole genome alignment. A parallel technique is applied to an algorithm for whole genome alignment, this technique is explained and some experiments were carried out to test it. This technique is based in a fair usage of the available resource to execute genome alignment and how this can be used in HPC clusters. This work is a rst approximation to whole genome alignment and it shows the advantages of parallelism and some of the drawbacks that our technique has. This work describes the resource limitations of current WGA applications when dealing with large quantities of sequences. It proposes a parallel heuristic to distribute the load and to assure that alignment quality is mantained.
Resumo:
Background: Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs). Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli.Results: Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs) treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580) and p38α deficient (p38α-/-) MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions: Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell adaptation to stress.
Resumo:
We summarize the progress in whole-genome sequencing and analyses of primate genomes. These emerging genome datasets have broadened our understanding of primate genome evolution revealing unexpected and complex patterns of evolutionary change. This includes the characterization of genome structural variation, episodic changes in the repeat landscape, differences in gene expression, new models regarding speciation, and the ephemeral nature of the recombination landscape. The functional characterization of genomic differences important in primate speciation and adaptation remains a significant challenge. Limited access to biological materials, the lack of detailed phenotypic data and the endangered status of many critical primate species have significantly attenuated research into the genetic basis of primate evolution. Next-generation sequencing technologies promise to greatly expand the number of available primate genome sequences; however, such draft genome sequences will likely miss critical genetic differences within complex genomic regions unless dedicated efforts are put forward to understand the full spectrum of genetic variation.
Resumo:
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.
Resumo:
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.
Resumo:
Background: Recent studies in pigs have detected copy number variants (CNVs) using the Comparative Genomic Hybridization technique in arrays designed to cover specific porcine chromosomes. The goal of this study was to identify CNV regions (CNVRs) in swine species based on whole genome SNP genotyping chips. Results: We used predictions from three different programs (cnvPartition, PennCNV and GADA) to analyze data from the Porcine SNP60 BeadChip. A total of 49 CNVRs were identified in 55 animals from an Iberian x Landrace cross (IBMAP) according to three criteria: detected in at least two animals, contained three or more consecutive SNPs and recalled by at least two programs. Mendelian inheritance of CNVRs was confirmed in animals belonging to several generations of the IBMAP cross. Subsequently, a segregation analysis of these CNVRs was performed in 372 additional animals from the IBMAP cross and its distribution was studied in 133 unrelated pig samples from different geographical origins. Five out of seven analyzed CNVRs were validated by real time quantitative PCR, some of which coincide with well known examples of CNVs conserved across mammalian species. Conclusions: Our results illustrate the usefulness of Porcine SNP60 BeadChip to detect CNVRs and show that structural variants can not be neglected when studying the genetic variability in this species.
Resumo:
Plesiomonas shigelloides, the only species of the genus, is an emergent pathogenic bacterium associated with human diarrheal and extraintestinal disease. We present the whole-genome sequence analysis of the representative strain for the O1 serotype (strain 302-73), providing a tool for studying bacterial outbreaks, virulence factors, and accurate diagnostic methods.
Resumo:
We investigated two siblings with granulomatous histiocytosis prominent in the nasal area, mimicking rhinoscleroma and Rosai-Dorfman syndrome. Genome-wide linkage analysis and whole-exome sequencing identified a homozygous frameshift deletion in SLC29A3, which encodes human equilibrative nucleoside transporter-3 (hENT3). Germline mutations in SLC29A3 have been reported in rare patients with a wide range of overlapping clinical features and inherited disorders including H syndrome, pigmented hypertrichosis with insulin-dependent diabetes, and Faisalabad histiocytosis. With the exception of insulin-dependent diabetes and mild finger and toe contractures in one sibling, the two patients with nasal granulomatous histiocytosis studied here displayed none of the many SLC29A3-associated phenotypes. This mild clinical phenotype probably results from a remarkable genetic mechanism. The SLC29A3 frameshift deletion prevents the expression of the normally coding transcripts. It instead leads to the translation, expression, and function of an otherwise noncoding, out-of-frame mRNA splice variant lacking exon 3 that is eliminated by nonsense-mediated mRNA decay (NMD) in healthy individuals. The mutated isoform differs from the wild-type hENT3 by the modification of 20 residues in exon 2 and the removal of another 28 amino acids in exon 3, which include the second transmembrane domain. As a result, this new isoform displays some functional activity. This mechanism probably accounts for the narrow and mild clinical phenotype of the patients. This study highlights the"rescue" role played by a normally noncoding mRNA splice variant of SLC29A3, uncovering a new mechanism by which frameshift mutations can be hypomorphic.
Resumo:
DNA cytosine methylation has been demonstrated to be a central epigenetic modification that has essential roles in a myriad of cellular processes. Some examples of these include gene regulation, DNA-protein interactions, cellular differentiation, X-inactivation, maintenance of genome integrity by suppressing transposable elements and viruses, embryogenesis, genomic imprinting and tumourigenesis. This list is increasingly growing thanks to recent advances in genome-wide technologies, like Whole Genome Bisulfite Sequencing (WGBS-Seq). The development of this technology in research has allowed the identification of new features of the DNA methylation landscape that was not possible using previous technologies, like Partially Methylated Domains (PMDs). PMDs have been found in several cell lines, as well as in both healthy and cancer primary samples. They have been described as regions with high variability in methylation levels across individual CpG sites and intermediate methylation levels on average with respect to the genome. Here, we performed an extensive search of PMDs in a big dataset of different haematopoietic primary cells from both myeloid and lymphoid lineages. We found and characterized significant PMDs in plasma B cells, confirming that PMDs are a phenomenon that is restricted to certain differentiated cells. Additionally, we found loci aberrantly hypomethylated in a myeloma sample which overlapped with plasma B cell PMDs. Genome-wide comparison of the myeloma and plasma B cell sample revealed that this is probably also the case for other loci.
Resumo:
Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.
Resumo:
Background: The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions.Results: We use Long Serial Analysis of Gene Expression (LongSAGE) in bursal lymphocytes and the DT40 cell line to verify the quality and completeness of the annotated transcripts. 53.6% of the more than 38,000 unique SAGE tags (unitags) match to full length bursal cDNAs, the Ensembl transcript build or the genome sequence. The majority of all matching unitags show single matches to the genome, but no matches to the genome derived Ensembl transcript build. Nevertheless, most of these tags map close to the 3' boundaries of annotated Ensembl transcripts.Conclusions: These results suggests that rather few genes are missing in the current Ensembl chicken transcript build, but that the 3' ends of many transcripts may not have been accurately predicted. The tags with no match in the transcript sequences can now be used to improve gene predictions, pinpoint the genomic location of entirely missed transcripts and optimize the accuracy of gene finder software.
Resumo:
Background: Despite its pervasiveness, the genetic basis of adaptation resulting in variation directly or indirectly related to temperature (climatic) gradients is poorly understood. By using 3-fold replicated laboratory thermal stocks covering much of the physiologically tolerable temperature range for the temperate (i.e., cold tolerant) species Drosophila subobscura we have assessed whole-genome transcriptional responses after three years of thermal adaptation, when the populations had already diverged for inversion frequencies, pre-adult life history components, and morphological traits. Total mRNA from each population was compared to a reference pool mRNA in a standard, highly replicated two-colour competitive hybridization experiment using cDNA microarrays.Results: A total of 306 (6.6%) cDNA clones were identified as 'differentially expressed' (following a false discovery rate correction) after contrasting the two furthest apart thermal selection regimes (i.e., 13°C vs . 22°C), also including four previously reported candidate genes for thermotolerance in Drosophila (Hsp26, Hsp68, Fst, and Treh). On the other hand, correlated patterns of gene expression were similar in cold- and warm-adapted populations. Analysis of functional categories defined by the Gene Ontology project point to an overrepresentation of genes involved in carbohydrate metabolism, nucleic acids metabolism and regulation of transcription among other categories. Although the location of differently expressed genes was approximately at random with respect to chromosomes, a physical mapping of 88 probes to the polytene chromosomes of D. subobscura has shown that a larger than expected number mapped inside inverted chromosomal segments.Conclusion: Our data suggest that a sizeable number of genes appear to be involved in thermal adaptation in Drosophila, with a substantial fraction implicated in metabolism. This apparently illustrates the formidable challenge to understanding the adaptive evolution of complex trait variation. Furthermore, some clustering of genes within inverted chromosomal sections was detected. Disentangling the effects of inversions will be obviously required in any future approach if we want to identify the relevant candidate genes.
Resumo:
Several studies over the last few years have shown that newly arising (de novo) mutations contribute to the genetics of schizophrenia (SZ), autism (ASD) and other developmental disorders. The strongest evidence comes from studies of de novo Copy Number Variation (CNV), where the rate of new mutations is shown to be increased in cases when compared to controls [23, 24]. Research on de novo point mutations and small insertion-deletions (indels) has been more limited, but with the development of next-generation sequencing (NGS) technology, such studies are beginning to provide preliminary evidence that de novo single-nucleotide mutations (SNVs) might also increase risk of SZ and ASD [25, 26] Advanced paternal age is a major source of new mutations in human beings [27] and could thus be associated with increased risk for developing SZ, ASD or other developmental disorders. Indeed, advanced paternal age is found to be a risk factor for developing SZ and ASD in the offspring [28, 29] and new mutations related to advanced paternal age have been implicated as a cause of sporadic cases in several autosomal dominant diseases, some neurodevelopmental diseases, including SZ and ASD, and social functioning. New single-base substitutions occur at higher rates at males compared to females and this difference increases with paternal age. This is due to the fact that sperm cells go through a much higher number of cell divisions (~840 by the age of 50), which increases the risk for DNA copy errors in the male germ line [30] . By contrast, the female eggs (oocytes) undergo only 24 cell divisions and all but the last occur during foetal life. The aim of my project is to determine the parent-of-origin of de novo SNVs, using large samples of parent-offspring trios affected with schizophrenia (SZ). From whole exome sequencing of 618 Bulgarian proband-offspring trios affected, nearly 1000 de novo (SNVs or small indels) have been identified and from these, the parent-of-origin of at least 60% of the mutations (N=600) can be established. This project is contained in a main one that consists on the determination of the parental origin of different types of de novo mutations (SNVs, small indels and large CNVs).