931 resultados para wide genome sequencing
Resumo:
Insects are the most diverse group of animals on the planet, comprising over 90% of all metazoan life forms, and have adapted to a wide diversity of ecosystems in nearly all environments. They have evolved highly sensitive chemical senses that are central to their interaction with their environment and to communication between individuals. Understanding the molecular bases of insect olfaction is therefore of great importance from both a basic and applied perspective. Odorant binding proteins (OBPs) are some of most abundant proteins found in insect olfactory organs, where they are the first component of the olfactory transduction cascade, carrying odorant molecules to the olfactory receptors. We carried out a search for OBPs in the genome of the parasitoid wasp Nasonia vitripennis and identified 90 sequences encoding putative OBPs. This is the largest OBP family so far reported in insects. We report unique features of the N. vitripennis OBPs, including the presence and evolutionary origin of a new subfamily of double-domain OBPs (consisting of two concatenated OBP domains), the loss of conserved cysteine residues and the expression of pseudogenes. This study also demonstrates the extremely dynamic evolution of the insect OBP family: (i) the number of different OBPs can vary greatly between species; (ii) the sequences are highly diverse, sometimes as a result of positive selection pressure with even the canonical cysteines being lost; (iii) new lineage specific domain arrangements can arise, such as the double domain OBP subfamily of wasps and mosquitoes.
Resumo:
DNA cytosine methylation has been demonstrated to be a central epigenetic modification that has essential roles in a myriad of cellular processes. Some examples of these include gene regulation, DNA-protein interactions, cellular differentiation, X-inactivation, maintenance of genome integrity by suppressing transposable elements and viruses, embryogenesis, genomic imprinting and tumourigenesis. This list is increasingly growing thanks to recent advances in genome-wide technologies, like Whole Genome Bisulfite Sequencing (WGBS-Seq). The development of this technology in research has allowed the identification of new features of the DNA methylation landscape that was not possible using previous technologies, like Partially Methylated Domains (PMDs). PMDs have been found in several cell lines, as well as in both healthy and cancer primary samples. They have been described as regions with high variability in methylation levels across individual CpG sites and intermediate methylation levels on average with respect to the genome. Here, we performed an extensive search of PMDs in a big dataset of different haematopoietic primary cells from both myeloid and lymphoid lineages. We found and characterized significant PMDs in plasma B cells, confirming that PMDs are a phenomenon that is restricted to certain differentiated cells. Additionally, we found loci aberrantly hypomethylated in a myeloma sample which overlapped with plasma B cell PMDs. Genome-wide comparison of the myeloma and plasma B cell sample revealed that this is probably also the case for other loci.
Resumo:
Social insects are promising model systems for epigenetics due to their immense morphological and behavioral plasticity. Reports that DNA methylation differs between the queen and worker castes in social insects [1-4] have implied a role for DNA methylation in regulating division of labor. To better understand the function of DNA methylation in social insects, we performed whole-genome bisulfite sequencing on brains of the clonal raider ant Cerapachys biroi, whose colonies alternate between reproductive (queen-like) and brood care (worker-like) phases [5]. Many cytosines were methylated in all replicates (on average 29.5% of the methylated cytosines in a given replicate), indicating that a large proportion of the C. biroi brain methylome is robust. Robust DNA methylation occurred preferentially in exonic CpGs of highly and stably expressed genes involved in core functions. Our analyses did not detect any differences in DNA methylation between the queen-like and worker-like phases, suggesting that DNA methylation is not associated with changes in reproduction and behavior in C. biroi. Finally, many cytosines were methylated in one sample only, due to either biological or experimental variation. By applying the statistical methods used in previous studies [1-4, 6] to our data, we show that such sample-specific DNA methylation may underlie the previous findings of queen- and worker-specific methylation. We argue that there is currently no evidence that genome-wide variation in DNA methylation is associated with the queen and worker castes in social insects, and we call for a more careful interpretation of the available data.
Resumo:
Colorectal cancer (CRC) is the third most common cancer and the fourth leading cause of cancer death worldwide. About 85% of the cases of CRC are known to have chromosomal instability, an allelic imbalance at several chromosomal loci, and chromosome amplification and translocation. The aim of this study is to determine the recurrent copy number variant (CNV) regions present in stage II of CRC through whole exome sequencing, a rapidly developing targeted next-generation sequencing (NGS) technology that provides an accurate alternative approach for accessing genomic variations. 42 normal-tumor paired samples were sequenced by Illumina Genome Analyzer. Data was analyzed with Varscan2 and segmentation was performed with R package R-GADA. Summary of the segments across all samples was performed and the result was overlapped with DEG data of the same samples from a previous study in the group1. Major and more recurrent segments of CNV were: gain of chromosome 7pq(13%), 13q(31%) and 20q(75%) and loss of 8p(25%), 17p(23%), and 18pq(27%). This results are coincident with the known literature of CNV in CRC or other cancers, but our methodology should be validated by array comparative genomic hybridisation (aCGH) profiling, which is currently the gold standard for genetic diagnosis of CNV.
Resumo:
Activated T helper (Th) cells have ability to differentiate into functionally distinct Th1, Th2 and Th17 subsets through a series of overlapping networks that include signaling and transcriptional control and the epigenetic mechanisms to direct immune responses. However, inappropriate execution in the differentiation process and abnormal function of these Th cells can lead to the development of several immune mediated diseases. Therefore, the thesis aimed at identifying genes and gene regulatory mechanisms responsible for Th17 differentiation and to study epigenetic changes associated with early stage of Th1/Th2 cell differentiation. Genome wide transcriptional profiling during early stages of human Th17 cell differentiation demonstrated differential regulation of several novel and currently known genes associated with Th17 differentiation. Selected candidate genes were further validated at protein level and their specificity for Th17 as compared to other T helper subsets was analyzed. Moreover, combination of RNA interference-mediated downregulation of gene expression, genome-wide transcriptome profiling and chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq), combined with computational data integration lead to the identification of direct and indirect target genes of STAT3, which is a pivotal upstream transcription factor for Th17 cell polarization. Results indicated that STAT3 directly regulates the expression of several genes that are known to play a role in activation, differentiation, proliferation, and survival of Th17 cells. These results provide a basis for constructing a network regulating gene expression during early human Th17 differentiation. Th1 and Th2 lineage specific enhancers were identified from genome-wide maps of histone modifications generated from the cells differentiating towards Th1 and Th2 lineages at 72h. Further analysis of lineage-specific enhancers revealed known and novel transcription factors that potentially control lineage-specific gene expression. Finally, we found an overlap of a subset of enhancers with SNPs associated with autoimmune diseases through GWASs suggesting a potential role for enhancer elements in the disease development. In conclusion, the results obtained have extended our knowledge of Th differentiation and provided new mechanistic insights into dysregulation of Th cell differentiation in human immune mediated diseases.
Resumo:
A 40-kb DNA region containing the major cluster of nif genes has been isolated from the Azospirillum brasilense Sp7 genome. In this region three nif operons have been identified: nifHDKorf1Y, nifENXorf3orf5fdxAnifQ and orf2nifUSVorf4. The operons containing nifENX and nifUSV genes are separated from the structural nifHDKorf1Y operon by about 5 kb and 10 kb, respectively. The present study shows the sequence analysis of the 6045-bp DNA region containing the nifENX genes. The deduced amino acid sequences from the open reading frames were compared to the nif gene products of other diazotrophic bacteria and indicate the presence of seven ORFs, all reading in the same direction as that of the nifHDKorf1Y operon. Consensus sigma54 and NifA-binding sites are present only in the promoter region upstream of the nifE gene. This promoter is activated by NifA protein and is approximately two-times less active than the nifH promoter, as indicated by the ß-galactosidase assays. This result suggests the differential expression of the nif genes and their respective products in Azospirillum.
Resumo:
Calves born persistently infected with non-cytopathic bovine viral diarrhea virus (ncpBVDV) frequently develop a fatal gastroenteric illness called mucosal disease. Both the original virus (ncpBVDV) and an antigenically identical but cytopathic virus (cpBVDV) can be isolated from animals affected by mucosal disease. Cytopathic BVDVs originate from their ncp counterparts by diverse genetic mechanisms, all leading to the expression of the non-structural polypeptide NS3 as a discrete protein. In contrast, ncpBVDVs express only the large precursor polypeptide, NS2-3, which contains the NS3 sequence within its carboxy-terminal half. We report here the investigation of the mechanism leading to NS3 expression in 41 cpBVDV isolates. An RT-PCR strategy was employed to detect RNA insertions within the NS2-3 gene and/or duplication of the NS3 gene, two common mechanisms of NS3 expression. RT-PCR amplification revealed insertions in the NS2-3 gene of three cp isolates, with the inserts being similar in size to that present in the cpBVDV NADL strain. Sequencing of one such insert revealed a 296-nucleotide sequence with a central core of 270 nucleotides coding for an amino acid sequence highly homologous (98%) to the NADL insert, a sequence corresponding to part of the cellular J-Domain gene. One cpBVDV isolate contained a duplication of the NS3 gene downstream from the original locus. In contrast, no detectable NS2-3 insertions or NS3 gene duplications were observed in the genome of 37 cp isolates. These results demonstrate that processing of NS2-3 without bulk mRNA insertions or NS3 gene duplications seems to be a frequent mechanism leading to NS3 expression and BVDV cytopathology.
Resumo:
When compared to other model organisms whose genome is sequenced, the number of mutations identified in the mouse appears extremely reduced and this situation seriously hampers our understanding of mammalian gene function(s). Another important consequence of this shortage is that a majority of human genetic diseases still await an animal model. To improve the situation, two strategies are currently used: the first makes use of embryonic stem cells, in which one can induce knockout mutations almost at will; the second consists of a genome-wide random chemical mutagenesis, followed by screening for mutant phenotypes and subsequent identification of the genetic alteration(s). Several projects are now in progress making use of one or the other of these strategies. Here, we report an original effort where we mutagenized BALB/c males, with the mutagen ethylnitrosourea. Offspring of these males were screened for dominant mutations and a three-generation breeding protocol was set to recover recessive mutations. Eleven mutations were identified (one dominant and ten recessives). Three of these mutations are new alleles (Otop1mlh, Foxn1sepe and probably rodador) at loci where mutations have already been reported, while 4 are new and original alleles (carc, eqlb, frqz, and Sacc). This result indicates that the mouse genome, as expected, is far from being saturated with mutations. More mutations would certainly be discovered using more sophisticated phenotyping protocols. Seven of the 11 new mutant alleles induced in our experiment have been localized on the genetic map as a first step towards positional cloning.
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.
Resumo:
The relative ease to concentrate and purify adenoviruses, their well characterized mid-sized genome, and the ability to delete non-essential regions from their genome to accommodate foreign gene, made adenoviruses a suitable candidate for the construction of vectors. The use of adenoviral vectors in gene therapy, vaccination, and as a general vector system for expressing foreign genes have been documented for some time. In this study, the objective was to rescue a BAV3 E1 or E3 recombinant vector carrying the kanamycin resistant gene, a dominant selectable marker with useful applications in studying vectored gene expression in mammalian cells. To accomplish the objective of this study, more information about BAV3 DNA sequences was required in order to make the manipulation of the virus genome accessible. Therefore, sequencing of the BAV3 genome from 1 1 .7% to 30.8% was carried out. Analysis of the determined sequences revealed the primary structure of important viral gene products coded by E2 including BAV3 DNA pol and precursor to terminal protein. Comparative analysis of these proteins with their counterparts from human and non human adenoviruses revealed important insights as to the evolutionary lineage of BAV3. In order to insert the kanamycin resistance gene in either E1 or E3, it was necessary to delete BAV3 sequences to accommodate the foreign gene so as not to exceed the limit of the packaging capacity of the virus. To construct a recombinant BAV3 in which a foreign gene was inserted in the deleted E1 region, an E1 shuttle vector was constructed. This involved the deletion from the viral sequences a region between 1.3% to 9% and inserting the kanamycin resistance gene to replace the deletion. The E1 shuttle vector contained the left (0%- 53.9%) segment of the genome and was expected to generate BAV3 recombinants that can be grown and propagated in cells that can complement the missing E1 functions. To construct a similar shuttle vector for E3 deletion, DNA sequences extending from 78.9% to 82.5% (1281 bp) were deleted from within the E3 region that had been cloned into a plasmid vector. The deleted region corresponds to those that have been shown to be non-essential for viral replication in cell culture. The resulting plasmid was used to construct another recombinant plasmid with BAV3 DNA sequences extending from 37.1% to 100% and with a deletion of E3 sequences that were replaced by kanamycin resistance gene. This shuttle plasmid was used in cotransfections with digested viral DNA in an attempt to rescue a recombinant BAV3 carrying the kanamycin resistance gene to replace the deleted E3. In spite of repeated attempts of transfection, El or E3 recombinant BAV3 were not isolated. It seems that other approaches should be applied to make a final conclusion on BAV3 infectivity.
Resumo:
Adenoviruses are nonenveloped icosahedral shaped particles. The double stranded DNA viral genome is divided into 5 major early transcription units, designated E1 A, E1 B, and E2 to E4, which are expressed in a regulated manner soon after infection. The gene products of the early region 3 (E3), shown to be nonessential for viral replication in vitro, are believed to be involved in counteracting host immunosurveillance. In order to sequence the E3 region of Bovine adenovirus type 2 (BAV2) it was necessary to determine the restriction map for the plasmid pEA48. A physical restriction endonuclease map for BamHl, Clal, Eco RI, Hindlll, Kpnl, Pstt, Sail, and Xbal was constructed. The DNA insert in pEA48 was determined to be viral in origin using Southern hybridization. A human adenovirus type 5 recombinant plasmid, containing partial DNA fragments of the two transcription units L4 and L5 that lie just outside the E3, was used to localize this region. The recombinant plasmid pEA was subcloned to facilitate sequencing. The DNA sequences between 74.8 and 90.5 map units containing the E3, the hexon associated protein (pVIII), and the fibre gene were determined. Homology comparison revealed that the genes for the hexon associated pV11I and the fibre protein are conserved. The last 70 amino acids of the BAV2 pV11I were the most conserved, showing a similarity of 87 percent with Ad2 pV1I1. A comparison between the predicted amino acid sequences of BAV2 and Ad40, Ad41 , Ad2 and AdS, revealed that they have an identical secondary structure consisting of a tail, a shaft and a knob. The shaft is composed of 22, 15 amino acid motifs, with periodic glycines and hydrophobic residues. The E3 region was found to consist of about 2.3 Kbp and to encode four proteins that were greater than 60 amino acids. However, these four open reading frames did not show significant homology to any other known adenovirus DNA or protein sequence.
Resumo:
Recombinant Adenoviruses (Ads) have been shown to have potential applications in three areas: gene therapy, high level protein expression and recombinant vaccines.' At least three different locations within the Ad genome can be deleted and subsequently used for the insertion of foreign sequences. These include the Early 3 (E3), Early 1 (E1) and Early 4 (E4) regions. Viral vectors of this type have been well studied in Human Ads 2 and 5, however one has not yet been constructed for Bovine Adenovirus Type 2 (BAV2). The E3 region is located between 76.6 and 86 m.u. on the r-strand and is transcribed in a rightward direction. The gene products of the Early 3 region (E3) have been shown to be non-essential for viral replication, in vitro, but are required for host immunosurveillance. This study represents the cloning and reconstitution of a BAV2 E3 deletion mutant. A deletion of 1800bp was made within the E3 region of BAV2 and the thymidine kinase gene was subsequently inserted in the deleted area . . The plasmid pdlE3-4tk1 (23.4Kbp) was constructed and used to to facilitate homologous recombination with the wild type BAV2 to produce a mutant. Southern Blotting and Hybridization results suggest the presence of a BAV2 E3 deletion mutant with thymidine kinase sequences present. The E4 region of Human Adenovirus types 2 and 5 is located at the extreme right end of the genome (91.3 map units - 99.1 map units) and is transcribed in a leftward direction giving rise to a complicated set of differentially spliced mRNAs. Essentially there are 7 open reading frames (ORFs) encoding for at least 7 polypeptides. The gene products encoded by the E4 region have been shown to be essential for the expression of late viral genes, host cell shutoff and normal viral growth. We have cloned and sequenced the right end segment between 90.5 map units and 100 map units of the BAV2 genome. The results show several open reading frames which encode polypeptides exhibiting homology to three polypeptides encoded by the E4 region of human adenovirus type 2. These include the 14kDa protein encoded by ORF1, the 34kDa protein encoded by ORF6 and the 13kDa protein encoded by ORF3. The nucleotide sequence, restriction enzyme map, and ORF map of the E4 region could be very useful in future molecular manipulation of this region and could possibly explain the slow growth rate of BAV2 in MDBK cells.
Resumo:
Les ataxies autosomiques récessives sont un groupe de troubles neurologiques hétérogènes caractérisés par une incoordination brute des mouvements musculaires impliquant le dysfonctionnement nerveux du cervelet qui coordonne le mouvement. Plusieurs formes héréditaires ont été décrites dont la plus connue : l’ataxie de Friedriech. Dans cette thèse nous rapportons l'identification et la caractérisation d’une nouvelle forme dans la population québécoise. L’ataxie récessive spastique avec leucoencéphalopathie (ARSAL; aussi connue comme l’ataxie autosomique récessive spastique de type 3 (SPAX3); OMIM 611390) est la deuxième ataxie spastique décrite dans la population canadienne française. En effet, près de 50 % de nos cas sont originaires de la région de Portneuf. En 2006, nous avons décrit les caractéristiques cliniques de cette nouvelle forme d’ataxie. Un premier criblage du génome entier, constitué de plus de 500 marqueurs microsatellites, a permis la localisation du locus sur le chromosome 2q33-34. Suite au séquençage de plus de 37 gènes candidats et afin de rétrécir cet intervalle candidat, nous avons utilisé une micro-puce d’ADN constituée de marqueurs SNP «single nucleotide polymorphism» et nous avons identifié un deuxième intervalle candidat de 0.658Mb au locus 2q33 dans lequel se trouvent moins de 9 gènes. L’identification et la caractérisation de ces mutations a nécessité l’utilisation de diverses technologies de pointe. Trois mutations (une délétion et deux réarrangements complexes) dans le gène mitochondrial tRNA-synthetase (MARS2) ont été identifiées dans notre cohorte. Nous émettons l’hypothèse que la nature des mutations complexes est responsable d’un dérèglement de la transcription du gène, ce qui a un impact néfaste sur la fonction mitochondriale et le tissu neuronal.
Resumo:
La maladie de Parkinson (MP) est une affection neurodégénérative invalidante et incurable. Il est maintenant clairement établi que d’importants déterminants génétiques prédisposent à son apparition. La recherche génétique sur des formes familiales de la MP a mené à la découverte d’un minimum de six gènes causatifs (SNCA, LRRK2, Parkin, PINK1, DJ-1 and GBA) et certains, par exemple LRRK2, contiennent des variations génétiques qui prédisposent également aux formes sporadiques. La caractérisation des protéines codées par ces gènes a mené à une meilleure compréhension des mécanismes moléculaires sousjacents. Toutefois, en dépit de ces efforts, les causes menant à l’apparition de la MP restent inconnues pour la majorité des patients. L’objectif général des présents travaux était d’identifier des mutations prédisposant à la MP dans la population canadienne-française du Québec à partir d’une cohorte composée principalement de patients sporadiques. Le premier volet de ce projet consistait à déterminer la présence de mutations de LRRK2 dans notre cohorte en séquençant directement les exons contenant la majorité des mutations pathogéniques et en effectuant une étude d’association. Nous n’avons identifié aucune mutation et l’étude d’association s’est avérée négative, suggérant ainsi que LRRK2 n’est pas une cause significative de la MP dans la population canadienne-française. La deuxième partie du projet avait pour objectif d’identifier de nouveaux gènes causatifs en séquençant directement des gènes candidats choisis à cause de leurs implications dans différents mécanismes moléculaires sous-tendant la MP. Notre hypothèse de recherche était basée sur l’idée que la MP est principalement due à des mutations individuellement rares dans un grand nombre de gènes différents. Nous avons identifié des mutations rares dans les gènes PICK1 et MFN1. Le premier code pour une protéine impliquée dans la régulation de la transmission du glutamate tandis que le second est un des acteurs-clés du processus de fusion mitochondriale. Nos résultats, qui devront être répliqués, suggèrent que le séquençage à grande échelle pourrait être une méthode prometteuse d’élucidation des facteurs de prédisposition génétiques à la MP ; ils soulignent l’intérêt d’utiliser une population fondatrice comme les canadiens-français pour ce type d’étude et devraient permettre d’approfondir les connaissances sur la pathogénèse moléculaire de la MP.
Resumo:
La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs de transcription à grande échelle. Le traitement des grandes quantités de données ainsi générées nécessite des moyens informatiques performants et de nombreux outils ont vu le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun une étape de l’analyse engendre des problèmes de compatibilité et complique les analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment concernant le traitement des résultats par MotIV conduisant à la découverte de motifs non détectés par les autres algorithmes.