124 resultados para BIOINFORMATICS DATABASES
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Resumo:
Background: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e. g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. Results: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/similar to vpopovic/research/ Conclusion: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.
Resumo:
SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch
Resumo:
Protein kinase genes account for almost 10% of all currently known cancer genes, highlighting the role of signal transduction in oncogenesis. A reexamination of the literature and available databases shows that E3 ubiquitin ligases are also key mediators of tumorigenesis. Altogether kinase and E3 genes represent more than 15% of the known cancer genes, underlining the importance of phosphorylation and ubiquitylation signaling pathways in cancer formation. Considering the recent literature reporting correlations between alterations in ubiquitylation processes and oncogenesis, this percentage is likely to increase even further in the future. Finally, E3 genes could serve as baits for the identification of additional cancer genes (e.g. their interacting partners). In contrast, deubiquitinases, like phosphatases, are not overrepresented among cancer genes. The same holds for E1 and E2 genes. Thus, kinase and E3 genes represent primary targets as cancer susceptibility genes for mutation screening and for the design of novel therapies.
Resumo:
BACKGROUND: The early detection of medullary thyroid carcinoma (MTC) can improve patient prognosis, because histological stage and patient age at diagnosis are highly relevant prognostic factors. As a consequence, delay in the diagnosis and/or incomplete surgical treatment should correlate with a poorer prognosis for patients. Few papers have evaluated the specific capability of fine-needle aspiration cytology (FNAC) to detect MTC, and small series have been reported. This study conducts a meta-analysis of published data on the diagnostic performance of FNAC in MTC to provide more robust estimates. RESEARCH DESIGN AND METHODS: A comprehensive computer literature search of the PubMed/MEDLINE, Embase and Scopus databases was conducted by searching for the terms 'medullary thyroid' AND 'cytology', 'FNA', 'FNAB', 'FNAC', 'fine needle' or 'fine-needle'. The search was updated until 21 March 2014, and no language restrictions were used. RESULTS: Fifteen relevant studies and 641 MTC lesions that had undergone FNAC were included. The detection rate (DR) of FNAC in patients with MTC (diagnosed as 'MTC' or 'suspicious for MTC') on a per lesion-based analysis ranged from 12·5% to 88·2%, with a pooled estimate of 56·4% (95% CI: 52·6-60·1%). The included studies were statistically heterogeneous in their estimates of DR (I-square >50%). Egger's regression intercept for DR pooling was 0·03 (95% CI: -3·1 to 3·2, P = 0·9). The study that reported the largest MTC series had a DR of 45%. Data on immunohistochemistry for calcitonin in diagnosing MTC were inconsistent for the meta-analysis. CONCLUSIONS: The presented meta-analysis demonstrates that FNAC is able to detect approximately one-half of MTC lesions. These findings suggest that other techniques may be needed in combination with FNAC to diagnose MTC and avoid false negative results.
Resumo:
Pendant ma thèse de doctorat, j'ai utilisé des espèces modèles, comme la souris et le poisson-zèbre, pour étudier les facteurs qui affectent l'évolution des gènes et leur expression. Plus précisément, j'ai montré que l'anatomie et le développement sont des facteurs clés à prendre en compte, car ils influencent la vitesse d'évolution de la séquence des gènes, l'impact sur eux de mutations (i.e. la délétion du gène est-elle létale ?), et leur tendance à se dupliquer. Où et quand il est exprimé impose à un gène certaines contraintes ou au contraire lui donne des opportunités d'évoluer. J'ai pu comparer ces tendances aux modèles classiques d'évolution de la morphologie, que l'on pensait auparavant refléter directement les contraintes s'appliquant sur le génome. Nous avons montré que les contraintes entre ces deux niveaux d'organisation ne peuvent pas être transférées simplement : il n'y a pas de lien direct entre la conservation du génotype et celle de phénotypes comme la morphologie. Ce travail a été possible grâce au développement d'outils bioinformatiques. Notamment, j'ai travaillé sur le développement de la base de données Bgee, qui a pour but de comparer l'expression des gènes entre différentes espèces de manière automatique et à large échelle. Cela implique une formalisation de l'anatomie, du développement et de concepts liés à l'homologie grâce à l'utilisation d'ontologies. Une intégration cohérente de données d'expression hétérogènes (puces à ADN, marqueurs de séquence exprimée, hybridations in situ) a aussi été nécessaire. Cette base de données est mise à jour régulièrement et disponible librement. Elle devrait contribuer à étendre les possibilités de comparaison de l'expression des gènes entre espèces pour des études d'évo-devo (évolution du développement) et de génomique. During my PhD, I used model species of vertebrates, such as mouse and zebrafish, to study factors affecting the evolution of genes and their expression. More precisely I have shown that anatomy and development are key factors to take into account, influencing the rate of gene sequence evolution, the impact of mutations (i.e. is the deletion of a gene lethal?), and the propensity of a gene to duplicate. Where and when genes are expressed imposes constraints, or on the contrary leaves them some opportunity to evolve. We analyzed these patterns in relation to classical models of morphological evolution in vertebrates, which were previously thought to directly reflect constraints on the genomes. We showed that the patterns of evolution at these two levels of organization do not translate smoothly: there is no direct link between the conservation of genotype and phenotypes such as morphology. This work was made possible by the development of bioinformatics tools. Notably, I worked on the development of the database Bgee, which aims at comparing gene expression between different species in an automated and large-scale way. This involves the formalization of anatomy, development, and concepts related to homology, through the use of ontologies. A coherent integration of heterogeneous expression data (microarray, expressed sequence tags, in situ hybridizations) is also required. This database is regularly updated and freely available. It should contribute to extend the possibilities for comparison of gene expression between species in evo-devo and genomics studies.
Resumo:
CONTEXT: Several genetic risk scores to identify asymptomatic subjects at high risk of developing type 2 diabetes mellitus (T2DM) have been proposed, but it is unclear whether they add extra information to risk scores based on clinical and biological data. OBJECTIVE: The objective of the study was to assess the extra clinical value of genetic risk scores in predicting the occurrence of T2DM. DESIGN: This was a prospective study, with a mean follow-up time of 5 yr. SETTING AND SUBJECTS: The study included 2824 nondiabetic participants (1548 women, 52 ± 10 yr). MAIN OUTCOME MEASURE: Six genetic risk scores for T2DM were tested. Four were derived from the literature and two were created combining all (n = 24) or shared (n = 9) single-nucleotide polymorphisms of the previous scores. A previously validated clinic + biological risk score for T2DM was used as reference. RESULTS: Two hundred seven participants (7.3%) developed T2DM during follow-up. On bivariate analysis, no differences were found for all but one genetic score between nondiabetic and diabetic participants. After adjusting for the validated clinic + biological risk score, none of the genetic scores improved discrimination, as assessed by changes in the area under the receiver-operating characteristic curve (range -0.4 to -0.1%), sensitivity (-2.9 to -1.0%), specificity (0.0-0.1%), and positive (-6.6 to +0.7%) and negative (-0.2 to 0.0%) predictive values. Similarly, no improvement in T2DM risk prediction was found: net reclassification index ranging from -5.3 to -1.6% and nonsignificant (P ≥ 0.49) integrated discrimination improvement. CONCLUSIONS: In this study, adding genetic information to a previously validated clinic + biological score does not seem to improve the prediction of T2DM.
Resumo:
Animal toxins are of interest to a wide range of scientists, due to their numerous applications in pharmacology, neurology, hematology, medicine, and drug research. This, and to a lesser extent the development of new performing tools in transcriptomics and proteomics, has led to an increase in toxin discovery. In this context, providing publicly available data on animal toxins has become essential. The UniProtKB/Swiss-Prot Tox-Prot program (http://www.uniprot.org/program/Toxins) plays a crucial role by providing such an access to venom protein sequences and functions from all venomous species. This program has up to now curated more than 5000 venom proteins to the high-quality standards of UniProtKB/Swiss-Prot (release 2012_02). Proteins targeted by these toxins are also available in the knowledgebase. This paper describes in details the type of information provided by UniProtKB/Swiss-Prot for toxins, as well as the structured format of the knowledgebase.
Resumo:
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). There may be multiple promoter entries for a single gene. The underlying experimental evidence comes from journal articles and, starting from release 73, from 5' ESTs of full-length cDNA clones used for so-called in silico primer extension. Access to promoter sequences is provided by pointers to TSS positions in nucleotide sequence entries. The annotation part of an EPD entry includes a description of the type and source of the initiation site mapping data, links to other biological databases and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Web-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria and to navigate to related databases exploiting different cross-references. Tools for analysing sequence motifs around TSSs defined in EPD are provided by the signal search analysis server. EPD can be accessed at http://www.epd. isb-sib.ch.
Resumo:
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.
Resumo:
MOTIVATION: Regulatory gene networks contain generic modules such as feedback loops that are essential for the regulation of many biological functions. The study of the stochastic mechanisms of gene regulation is instrumental for the understanding of how cells maintain their expression at levels commensurate with their biological role, as well as to engineer gene expression switches of appropriate behavior. The lack of precise knowledge on the steady-state distribution of gene expression requires the use of Gillespie algorithms and Monte-Carlo approximations. METHODOLOGY: In this study, we provide new exact formulas and efficient numerical algorithms for computing/modeling the steady-state of a class of self-regulated genes, and we use it to model/compute the stochastic expression of a gene of interest in an engineered network introduced in mammalian cells. The behavior of the genetic network is then analyzed experimentally in living cells. RESULTS: Stochastic models often reveal counter-intuitive experimental behaviors, and we find that this genetic architecture displays a unimodal behavior in mammalian cells, which was unexpected given its known bimodal response in unicellular organisms. We provide a molecular rationale for this behavior, and we implement it in the mathematical picture to explain the experimental results obtained from this network.
Resumo:
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.
Resumo:
NanoImpactNet (NIN) is a multidisciplinary European Commission funded network on the environmental, health and safety (EHS) impact of nanomaterials. The 24 founding scientific institutes are leading European research groups active in the fields of nanosafety, nanorisk assessment and nanotoxicology. This 4-year project is the new focal point for information exchange within the research community. Contact with other stakeholders is vital and their needs are being surveyed. NIN is communicating with 100s of stakeholders: businesses; internet platforms; industry associations; regulators; policy makers; national ministries; international agencies; standard-setting bodies and NGOs concerned by labour rights, EHS or animal welfare. To improve this communication, internet research, a questionnaire distributed via partners and targeted phone calls were used to identify stakeholders' interests and needs. Knowledge gaps and the necessity for further data mentioned by representatives of all stakeholder groups in the targeted phone calls concerned: • the potential toxic and safety hazards of nanomaterials throughout their lifecycles; • the fate and persistence of nanoparticles in humans, animals and the environment; • the associated risks of nanoparticle exposure; • greater participation in: the preparation of nomenclature, standards, methodologies, protocols and benchmarks; • the development of best practice guidelines; • voluntary schemes on responsibility; • databases of materials, research topics and themes, but also of expertise. These findings suggested that stakeholders and NIN researchers share very similar knowledge needs, and that open communication and free movement of knowledge will benefit both researchers and industry. Subsequently a workshop was organised by NIN focused on building a sustainable multi-stakeholder dialogue. Specific questions were asked to different stakeholder groups to encourage discussions and open communication. 1. What information do stakeholders need from researchers and why? The discussions about this question confirmed the needs identified in the targeted phone calls. 2. How to communicate information? While it was agreed that reporting should be enhanced, commercial confidentiality and economic competition were identified as major obstacles. It was recognised that expertise was needed in the areas of commercial law and economics for a wellinformed treatment of this communication issue. 3. Can engineered nanomaterials be used safely? The idea that nanomaterials are probably safe because some of them have been produced 'for a long time', was questioned, since many materials in common use have been proved to be unsafe. The question of safety is also about whether the public has confidence. New legislation like REACH could help with this issue. Hazards do not materialise if exposure can be avoided or at least significantly reduced. Thus, there is a need for information on what can be regarded as acceptable levels of exposure. Finally, it was noted that there is no such thing as a perfectly safe material but only boundaries. At this moment we do not know where these boundaries lie. The matter of labelling of products containing nanomaterials was raised, as in the public mind safety and labelling are connected. This may need to be addressed since the issue of nanomaterials in food, drink and food packaging may be the first safety issue to attract public and media attention, and this may have an impact on 'nanotechnology as a whole. 4. Do we need more or other regulation? Any decision making process should accommodate the changing level of uncertainty. To address the uncertainties, adaptations of frameworks such as REACH may be indicated for nanomaterials. Regulation is often needed even if voluntary measures are welcome because it mitigates the effects of competition between industries. Data cannot be collected on voluntary bases for example. NIN will continue with an active stakeholder dialogue to further build on interdisciplinary relationships towards a healthy future with nanotechnology.