324 resultados para MULTILOCUS SEQUENCE-ANALYSIS
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
The development of targeted treatment strategies adapted to individual patients requires identification of the different tumor classes according to their biology and prognosis. We focus here on the molecular aspects underlying these differences, in terms of sets of genes that control pathogenesis of the different subtypes of astrocytic glioma. By performing cDNA-array analysis of 53 patient biopsies, comprising low-grade astrocytoma, secondary glioblastoma (respective recurrent high-grade tumors), and newly diagnosed primary glioblastoma, we demonstrate that human gliomas can be differentiated according to their gene expression. We found that low-grade astrocytoma have the most specific and similar expression profiles, whereas primary glioblastoma exhibit much larger variation between tumors. Secondary glioblastoma display features of both other groups. We identified several sets of genes with relatively highly correlated expression within groups that: (a). can be associated with specific biological functions; and (b). effectively differentiate tumor class. One prominent gene cluster discriminating primary versus nonprimary glioblastoma comprises mostly genes involved in angiogenesis, including VEGF fms-related tyrosine kinase 1 but also IGFBP2, that has not yet been directly linked to angiogenesis. In situ hybridization demonstrating coexpression of IGFBP2 and VEGF in pseudopalisading cells surrounding tumor necrosis provided further evidence for a possible involvement of IGFBP2 in angiogenesis. The separating groups of genes were found by the unsupervised coupled two-way clustering method, and their classification power was validated by a supervised construction of a nearly perfect glioma classifier.
Resumo:
There is a significant potential to improve the plant-beneficial effects of root-colonizing pseudomonads by breeding wheat genotypes with a greater capacity to sustain interactions with these bacteria. However, the interaction between pseudomonads and crop plants at the cultivar level, as well as the conditions which favor the accumulation of beneficial microorganisms in the wheat rhizosphere, is largely unknown. Therefore, we characterized the three Swiss winter wheat (Triticum aestivum) cultivars Arina, Zinal, and Cimetta for their ability to accumulate naturally occurring plant-beneficial pseudomonads in the rhizosphere. Cultivar performance was measured also by the ability to select for specific genotypes of 2,4-diacetylphloroglucinol (DAPG) producers in two different soils. Cultivar-specific differences were found; however, these were strongly influenced by the soil type. Denaturing gradient gel electrophoresis (DGGE) analysis of fragments of the DAPG biosynthetic gene phlD amplified from natural Pseudomonas rhizosphere populations revealed that phlD diversity substantially varied between the two soils and that there was a cultivar-specific accumulation of certain phlD genotypes in one soil but not in the other. Furthermore, the three cultivars were tested for their ability to benefit from Pseudomonas inoculants. Interestingly, Arina, which was best protected against Pythium ultimum infection by inoculation with Pseudomonas fluorescens biocontrol strain CHA0, was the cultivar which profited the least from the bacterial inoculant in terms of plant growth promotion in the absence of the pathogen. Knowledge gained of the interactions between wheat cultivars, beneficial pseudomonads, and soil types allows us to optimize cultivar-soil combinations for the promotion of growth through beneficial pseudomonads. Additionally, this information can be implemented by breeders into a new and unique breeding strategy for low-input and organic conditions.
Resumo:
Gene expression signatures are used in the clinic as prognostic tools to determine the risk of individual patients with localized breast tumors developing distant metastasis. We lack a clear understanding, however, of whether these correlative biomarkers link to a common biological network that regulates metastasis. We find that the c-MYC oncoprotein coordinately regulates the expression of 13 different "poor-outcome" cancer signatures. In addition, functional inactivation of MYC in human breast cancer cells specifically inhibits distant metastasis in vivo and invasive behavior in vitro of these cells. These results suggest that MYC oncogene activity (as marked by "poor-prognosis" signature expression) may be necessary for the translocation of poor-outcome human breast tumors to distant sites.
Resumo:
Expression of human leucocyte antigen (HLA) Class I molecules is essential for the recognition of malignant melanoma (MM) cells by CD8(+) T lymphocytes. A complete or partial loss of HLA Class I molecules is a potent strategy for MM cells to escape from immunosurveillance. In 2 out of 55 melanoma cell cultures we identified a complete phenotypic loss of HLA allospecificities. Both patients have been treated unsuccessfully with HLA-A2 peptides. To identify the reasons underlying the loss of single HLA-A allospecificities, we searched for genomic alterations at the locus for HLA Class I alpha-chain on chromosome 6 in melanoma cell cultures established from 2 selected patients with MM in advanced stage. This deficiency was associated with alterations of HLA-A2 gene sequences as determined by polymerase chain reaction-sequence specific primers (PCR-SSP). Karyotyping revealed a chromosomal loss in Patient 1, whereas melanoma cell cultures established from Patient 2 displayed 2 copies of chromosome 6. Loss of heterozygosity (LOH) using markers located around position 6p21 was detected in both cases. By applying group-specific primer-mixes spanning the 5'-flanking region of the HLA-A2 gene locus the relevant region was amplified by PCR and subsequent sequencing allowed alignment with the known HLA Class I reference sequences. Functional assays using HLA-A2-restricted cytotoxic T-cell clones were performed in HLA-A2 deficient MM cultures and revealed a drastically reduced susceptibility to CTL lysis in HLA-A2 negative cells. We could document the occurrence of selective HLA-A2 deficiencies in cultured advanced-stage melanoma metastases and identify their molecular causes as genomic alterations within the HLA-A gene locus.
Resumo:
Populations of the marble trout (Salmo marmoratus) have declined critically due to introgression by brown trout (Salmo trutta) strains. In order to define strategies for long-term conservation, we examined the genetic structure of the 8 known pure populations using 15 microsatellite loci. The analyses reveal extraordinarily strong genetic differentiation among populations separated by < 15 km, and extremely low levels of intrapopulation genetic variability. As natural recolonization seems highly unlikely, appropriate management and conservation strategies should comprise the reintroduction of pure populations from mixed stocks (translocation) to avoid further loss of genetic diversity.
Resumo:
Background: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e. g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. Results: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/similar to vpopovic/research/ Conclusion: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.
Resumo:
Investigating macro-geographical genetic structures of animal populations is crucial to reconstruct population histories and to identify significant units for conservation. This approach may also provide information about the intraspecific flexibility of social systems. We investigated the history and current structure of a large number of populations in the communally breeding Bechstein's bat (Myotis bechsteinii). Our aim was to understand which factors shape the species' social system over a large ecological and geographical range. Using sequence data from one coding and one noncoding mitochondrial DNA region, we identified the Balkan Peninsula as the main and probably only glacial refugium of the species in Europe. Sequence data also suggest the presence of a cryptic taxon in the Caucasus and Anatolia. In a second step, we used seven autosomal and two mitochondrial microsatellite loci to compare population structures inside and outside of the Balkan glacial refugium. Central European and Balkan populations both were more strongly differentiated for mitochondrial DNA than for nuclear DNA, had higher genetic diversities and lower levels of relatedness at swarming (mating) sites than in maternity (breeding) colonies, and showed more differentiation between colonies than between swarming sites. All these suggest that populations are shaped by strong female philopatry, male dispersal, and outbreeding throughout their European range. We conclude that Bechstein's bats have a stable social system that is independent from the postglacial history and location of the populations. Our findings have implications for the understanding of the benefits of sociality in female Bechstein's bats and for the conservation of this endangered species.
Resumo:
BACKGROUND: Accurate catalogs of structural variants (SVs) in mammalian genomes are necessary to elucidate the potential mechanisms that drive SV formation and to assess their functional impact. Next generation sequencing methods for SV detection are an advance on array-based methods, but are almost exclusively limited to four basic types: deletions, insertions, inversions and copy number gains. RESULTS: By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation. CONCLUSIONS: An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.
Resumo:
Peptide signaling presumably occupies a central role in plant development, yet only few concrete examples of receptor-ligand pairs that act in the context of specific differentiation processes have been described. Here we report that second-site null mutations in the Arabidopsis leucine-rich repeat receptor-like kinase gene barely any meristem 3 (BAM3) perfectly suppress the postembryonic root meristem growth defect and the associated perturbed protophloem development of the brevis radix (brx) mutant. The roots of bam3 mutants specifically resist growth inhibition by the CLAVATA3/ENDOSPERM SURROUNDING REGION 45 (CLE45) peptide ligand. WT plants transformed with a construct for ectopic overexpression of CLE45 could not be recovered, with the exception of a single severely dwarfed and sterile plant that eventually died. By contrast, we obtained numerous transgenic bam3 mutants transformed with the same construct. These transgenic plants displayed a WT phenotype, however, supporting the notion that CLE45 is the likely BAM3 ligand. The results correlate with the observation that external CLE45 application represses protophloem differentiation in WT, but not in bam3 mutants. BAM3, BRX, and CLE45 are expressed in a similar spatiotemporal trend along the developing protophloem, up to the end of the transition zone. Induction of BAM3 expression upon CLE45 application, ectopic overexpression of BAM3 in brx root meristems, and laser ablation experiments suggest that intertwined regulatory activity of BRX, BAM3, and CLE45 could be involved in the proper transition of protophloem cells from proliferation to differentiation, thereby impinging on postembryonic growth capacity of the root meristem.
Resumo:
The complexity of mammalian genome organization demands a complex interplay of DNA and proteins to orchestrate proper gene regulation. CTCF, a highly conserved, ubiquitously expressed protein has been postulated as a primary organizer of genome architecture because of its roles in transcriptional activation/repression, insulation and imprinting. Diverse regulatory functions are exerted through genome wide binding via a central eleven zinc finger DNA binding domain and an array of diverse protein-protein interactions through N- and C- terminal domains. CTCFL has been identified as a paralog of CTCF expressed only in spermatogenic cells of the testis. CTCF and CTCFL have a highly homologous DNA-binding domain, while the flanking amino acid sequences exhibit no significant similarity. Genome- wide mapping of CTCF binding sites has been carried out in many cell types, but no data exist for CTCFL apart from a few identified loci. The lack of high quality antibodies prompted us to generate an endogenously flag-tagged CTCFL mouse model using BAC recombination. IHC staining using anti-flag antibodies confirmed CTCFL localization to type Β spermatogonia and preleptotene spermatocytes and a mutually exclusive pattern of expression with CTCF. ChIP followed by high-throughput sequencing identified 10,382 binding sites showing 70% overlap but representing only 20% of CTCF sites. Consensus sequence analysis identified a significantly longer binding motif with prominently less ambiguity of base calling at every position. The significant difference between CTCF and CTCFL genomic binding patterns proposes that their binding to DNA is differentially regulated. Analysis of CTCFL binding to methylated regions on a genome wide scale identified approximately 1,000 loci. Methylation-independent binding of CTCFL might be at least one of the mechanisms that ensures distinct binding patterns of CTCF and CTCFL since CTCF binding is methylation- sensitive. Co-localization of CTCF with cohesin has been well established and analysis of CTCFL and SMC3 overlap identified around 3,300 binding sites from which two related but distinct consensus sequence motifs were derived. Because virtually all data for cohesin binding originate from mitotically proliferating cells, the anticipated overlap is expected to be considerably higher in meiotic cells. Meiosis-specific cohesin subunit Rec8 is specific for spermatocytes and 6 out of the 12 identified binding sites are also bound by CTCFL. In conclusion, this was the first genome-wide mapping of CTCFL binding sites in spermatocytes, the only cell type where CTCF is not expressed. CTCFL has a unique binding site repertoire distinct from CTCF, binds to methylated sequences and shows a significant overlap with cohesin binding sites. Future efforts will be oriented towards deciphering the role CTCFL plays in conversion of chromatin structure and function from mitotic to meiotic chromosomes. - La complexité de l'organisation du génome des mammifères exige une interaction particulière entre ADN et protéines pour orchestrer une régulation appropriée de l'expression des gènes. CTCFL, une protéine ubiquitaire très conservée, serait le principal organisateur de l'architecture du génome de par son rôle dans l'activation / la répression de la transcription, la protection et la localisation des gènes. Diverses régulations sont opérées, d'une part au travers d'interactions à différents endroits du génome par le biais d'un domaine protéique central de liaison à l'ADN à onze doigts de zinc, et d'autre part par des interactions protéine-protéine variées au niveau de leur domaine N- et C-terminal. CTCFL a été identifié comme un paralogue de CTCF exprimé uniquement dans les cellules spermatiques du testicule. CTCFL et CTCF ont un domaine de liaison à l'ADN très homologue, tandis que les séquences d'acides aminés situées de part et d'autre de ce domaine ne présentent aucune similitude. Une cartographie générale des sites de liaison au CTCF a été réalisée pour de nombreux types cellulaires, mais il n'existe aucune donnée pour CTCFL à l'exception de l'identification de quelques loci. L'absence d'anticorps de bonne qualité nous a conduit à générer un modèle murin portant un CTCFL endogène taggué grâce à un procédé de recombinaison BAC. Une coloration IHC à l'aide d'anticorps anti-FLAG a confirmé la présence de CTCFL au niveau des spermatogonies de type Β et des spermatocytes au stade préleptotène, et une distribution mutuellement exclusive avec CTCF. Une méthode de Chromatine Immunoprecipitation (ChIP) suivie d'un séquençage à haut débit a permis d'identifier 10.382 sites de liaison montrant 70% d'homologie mais ne représentant que 20% des sites CTCF. L'analyse de la séquence consensus révèle un motif de fixation à l'ADN nettement plus long et qui comporte bien moins de bases aléatoires à chaque position nucléotidique. La différence significative entre les séquences génomiques des sites de liaison au CTCF et CTCFL suggère que leur fixation à l'ADN est régulée différemment. Appliquée à l'échelle du génome, l'étude de l'interaction de CTCFL avec des régions méthylées de l'ADN a permis d'identifier environ 1.000 loci. Contrairement à CTCFL, la liaison de CTCF dépend de l'état de méthylation de l'ADN ; cette modification épigénétique constitue donc au moins un des mécanismes de régulation expliquant une localisation de CTCF et CTCFL à des sites distincts du génome. La co- localisation de CTCF avec la cohésine étant établie, l'analyse de la superposition des séquences de CTCFL avec la sous-unité SMC3 identifie environ 3.300 sites de liaison parmi lesquels deux mêmes motifs consensus distincts par leur séquence sont mis en évidence. La presque quasi-totalité des données sur la cohésine ayant été établie à partir de cellules en prolifération mitotique, il est probable que la similitude au sein des séquences consensus soit encore plus grande dans le cas des cellules en méiose. La sous-unité Rec8 de la cohésine propre à l'état de méiose est spécifiquement exprimée dans les spermatocytes. Or 6 des 12 sites de liaison identifiés sont également utilisés par CTCFL. Pour conclure, ce travail constitue la première cartographie à l'échelle du génome des sites de liaison de CTCFL dans les spermatocytes, seul type cellulaire où CTCFL n'est pas exprimé. CTCFL possède un répertoire unique de sites de fixation à l'ADN distinct de CTCF, se lie à des séquences méthylées et présente un nombre important de sites de liaison communs avec la cohésine. Les perspectives futures sont d'élucider le rôle de CTCFL dans le remodelage de la structure de la chromatine et de définir sa fonction dans le processus de méiose.
Resumo:
Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.
Resumo:
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). There may be multiple promoter entries for a single gene. The underlying experimental evidence comes from journal articles and, starting from release 73, from 5' ESTs of full-length cDNA clones used for so-called in silico primer extension. Access to promoter sequences is provided by pointers to TSS positions in nucleotide sequence entries. The annotation part of an EPD entry includes a description of the type and source of the initiation site mapping data, links to other biological databases and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Web-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria and to navigate to related databases exploiting different cross-references. Tools for analysing sequence motifs around TSSs defined in EPD are provided by the signal search analysis server. EPD can be accessed at http://www.epd. isb-sib.ch.