926 resultados para Genomic data integration
Resumo:
Este proyecto está orientado a ayudar a un grupo de investigadores del Departamento de Ciencia Animal y de los Alimentos (UAB), que se dedican a recopilar datos genómicos obtenidos en experimentos. La aplicación constará de dos partes. La primera es la parte de los usuarios, donde se podrá crear proyectos, insertar, modificar o eliminar datos y consultar la información ya existente en la aplicación. La segunda parte es la del administrador, que como tal podrá dar de alta a nuevos usuarios, restaurar versiones anteriores de la base de datos, eliminar usuarios y consultar las acciones realizadas por los usuarios.
Resumo:
The current drug options for the treatment of chronic Chagas disease have not been sufficient and high hopes have been placed on the use of genomic data from the human parasite Trypanosoma cruzi to identify new drug targets and develop appropriate treatments for both acute and chronic Chagas disease. However, the lack of a complete assembly of the genomic sequence and the presence of many predicted proteins with unknown or unsure functions has hampered our complete view of the parasite's metabolic pathways. Moreover, pinpointing new drug targets has proven to be more complex than anticipated and has revealed large holes in our understanding of metabolic pathways and their integrated regulation, not only for this parasite, but for many other similar pathogens. Using an in silicocomparative study on pathway annotation and searching for analogous and specific enzymes, we have been able to predict a considerable number of additional enzymatic functions in T. cruzi. Here we focus on the energetic pathways, such as glycolysis, the pentose phosphate shunt, the Krebs cycle and lipid metabolism. We point out many enzymes that are analogous to those of the human host, which could be potential new therapeutic targets.
Resumo:
Major climatic and geological events but also population history (secondary contacts) have generated cycles of population isolation and connection of long and short periods. Recent empirical and theoretical studies suggest that fast evolutionary processes might be triggered by such events, as commonly illustrated in ecology by the adaptive radiation of cichlid fishes (isolation and reconnection of lakes and watersheds) and in epidemiology by the fast adaptation of the influenza virus (isolation and reconnection in hosts). We test whether cyclic population isolation and connection provide the raw material (standing genetic variation) for species evolution and diversification. Our analytical results demonstrate that population isolation and connection can provide, to populations, a high excess of genetic diversity compared with what is expected at equilibrium. This excess is either cyclic (high allele turnover) or cumulates with time depending on the duration of the isolation and the connection periods and the mutation rate. We show that diversification rates of animal clades are associated with specific periods of climatic cycles in the Quaternary. We finally discuss the importance of our results for macroevolutionary patterns and for the inference of population history from genomic data.
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
Resumo:
This review describes the advances in malaria antigen discovery and vaccine development using the long synthetic peptide platforms that have been made available during the past 5 years. The most recent technical developments regarding peptide synthesis with the optimized production of large synthetic fragments are discussed. Clinical trials of long synthetic peptides are also reviewed. These trials demonstrated that long synthetic peptides are safe and immunogenic when formulated with various adjuvants. In addition, long synthetic peptides can elicit an antibody response in humans and have demonstrated inhibitory activity against parasite growth in vitro. Finally, new approaches to exploit the abundance of genomic data and the flexibility and speed of peptide synthesis are proposed.
Resumo:
Psoriasis is one of the most common chronic, inflammatory, T-cell-mediated autoimmune diseases. Over the past decade, increased knowledge of disease pathogenesis has fundamentally changed psoriasis treatment, with the introduction of biologics, and this has led to a multitude of improved selective targets providing potential therapeutic options. Indeed, numerous pathogenesis-based treatments are currently in development, as psoriasis has also become increasingly relevant for proof-of-concept studies. The purpose of this review was to summarize current knowledge of psoriasis immunopathogenesis, focusing on the T-cell-mediated immune response and its initiation. The authors describe recent advances in psoriasis treatment and discuss pathogenesis-based therapies that are currently in development or which could be envisioned for the future. Although current biologics are well tolerated, several issues such as long-term efficacy, long-term safety, and high costs keep driving the search for new and better therapies. With further advances in understanding disease pathogenesis, more genomic data from psoriasis patients becoming available, and potentially the identification of autoantigens in psoriasis, current research should lead to the development of a growing arsenal of improved targeted treatments and to further breakthrough immunotherapies.
Resumo:
Genome-scale metabolic network reconstructions are now routinely used in the study of metabolic pathways, their evolution and design. The development of such reconstructions involves the integration of information on reactions and metabolites from the scientific literature as well as public databases and existing genome-scale metabolic models. The reconciliation of discrepancies between data from these sources generally requires significant manual curation, which constitutes a major obstacle in efforts to develop and apply genome-scale metabolic network reconstructions. In this work, we discuss some of the major difficulties encountered in the mapping and reconciliation of metabolic resources and review three recent initiatives that aim to accelerate this process, namely BKM-react, MetRxn and MNXref (presented in this article). Each of these resources provides a pre-compiled reconciliation of many of the most commonly used metabolic resources. By reducing the time required for manual curation of metabolite and reaction discrepancies, these resources aim to accelerate the development and application of high-quality genome-scale metabolic network reconstructions and models.
Resumo:
O trabalho que se segue fala sobre o processo ETL (Extract, Transform and Load) e as ferramentas que estão associadas ao ETL. É apresentado um enquadramento teórico sobre esse processo, onde são distinguidas as principais etapas desse processo (Extração, Transformação e Load) e aprofundar um pouco sobre esse conceito. Fala também sobre as ferramentas de ETL (Comerciais e OpenSource), com destaque para a ferramenta Talend Open Studio for Data Integration visto que ela é utilizada para implementação de sistemas de ETL na Unitel T+ e vai ser apresentado um estudo do caso prático sobre esses sistemas.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.
Resumo:
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
Resumo:
To identify common variants influencing body mass index (BMI), we analyzed genome-wide association data from 16,876 individuals of European descent. After previously reported variants in FTO, the strongest association signal (rs17782313, P = 2.9 x 10(-6)) mapped 188 kb downstream of MC4R (melanocortin-4 receptor), mutations of which are the leading cause of monogenic severe childhood-onset obesity. We confirmed the BMI association in 60,352 adults (per-allele effect = 0.05 Z-score units; P = 2.8 x 10(-15)) and 5,988 children aged 7-11 (0.13 Z-score units; P = 1.5 x 10(-8)). In case-control analyses (n = 10,583), the odds for severe childhood obesity reached 1.30 (P = 8.0 x 10(-11)). Furthermore, we observed overtransmission of the risk allele to obese offspring in 660 families (P (pedigree disequilibrium test average; PDT-avg) = 2.4 x 10(-4)). The SNP location and patterns of phenotypic associations are consistent with effects mediated through altered MC4R function. Our findings establish that common variants near MC4R influence fat mass, weight and obesity risk at the population level and reinforce the need for large-scale data integration to identify variants influencing continuous biomedical traits.
Resumo:
Simulated-annealing-based conditional simulations provide a flexible means of quantitatively integrating diverse types of subsurface data. Although such techniques are being increasingly used in hydrocarbon reservoir characterization studies, their potential in environmental, engineering and hydrological investigations is still largely unexploited. Here, we introduce a novel simulated annealing (SA) algorithm geared towards the integration of high-resolution geophysical and hydrological data which, compared to more conventional approaches, provides significant advancements in the way that large-scale structural information in the geophysical data is accounted for. Model perturbations in the annealing procedure are made by drawing from a probability distribution for the target parameter conditioned to the geophysical data. This is the only place where geophysical information is utilized in our algorithm, which is in marked contrast to other approaches where model perturbations are made through the swapping of values in the simulation grid and agreement with soft data is enforced through a correlation coefficient constraint. Another major feature of our algorithm is the way in which available geostatistical information is utilized. Instead of constraining realizations to match a parametric target covariance model over a wide range of spatial lags, we constrain the realizations only at smaller lags where the available geophysical data cannot provide enough information. Thus we allow the larger-scale subsurface features resolved by the geophysical data to have much more due control on the output realizations. Further, since the only component of the SA objective function required in our approach is a covariance constraint at small lags, our method has improved convergence and computational efficiency over more traditional methods. Here, we present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on a synthetic data set, and then applied to data collected at the Boise Hydrogeophysical Research Site.
Resumo:
El objetivo principal del TFC es la construcción y explotación de un almacén de datos. El proceso de trabajo se basa en la ejecución de un caso práctico, en el cual se presenta un escenario en el que se necesita desarrollar un almacén de datos para la Fundació d'Estudis per a la Conducció Responsable, la cual desea estudiar la evolución del número de desplazamientos en vehículo de motor en Cataluña así como analizar las posibles correlaciones entre medios de locomoción, perfiles de conductores y algunas variables de seguridad vial.
Resumo:
Ohjelmistotuotteen hallinta (SCM)on tärkeä osa ohjelmistoprojekteja. Se koostuu ohjelmistotuotteen hallinnan suunnittelusta, muutoksen hallinnasta, version hallinnasta, kääntämisestä, paketoinnista, kokoonpanon tilanteen seurannasta ja sen tarkistuksesta. Ohjelmistotuotteen hallintatietokanta (SCM DB) on tarkoitettu SCM:n liittyvändatan tallettamiseen yhteen paikkaan, jossa data on kaikkien löydettävissä. SCMDB on relaatiotietokanta ja WWW-käyttöliittymä sille. Tietokantaan talletetaan SCM - infrastruktuuri, SCM -resurssit, SCM -työskentelypaikat, integrointisuunnitteludata, paketoinnin raportit ja ohjeistukset, muutoksenhallintadata ja työkalujen hallintadata. Tietokannalla on monta käyttäjää. SCM managerit tallettavat tietokantaa yleiset tiedot, Integrointimanagerit tallettavat kantaan integrointisuunnitelmaa varten julkaisua koskevat tiedot. Paketointivastuulliset tallettavat kantaan paketointiraportit. Ohj elmistosuunnittelijat tekevät muutosvaatimuksia tietokantaan, jotka muutoksenhallintaelin käsittelee. He näkevät kannan kautta myös virheraportit. Työkalujen koordinointi tapahtuu myös kantaan talletettujen tietojen avulla. Lukemiseen tietokantaa voivat käyttää kaikki testauksesta suunnittelijoihin aikataulujen osalta. Tietokannasta voidaan lukea myös paketointityökalujen tallettamia tietoja ohjelmalohkoista eri pakettiversioissa. Paketointityökalut tai paketointivastuulliset saavat kannasta myös suoraan lähdetiedon paketointityökaluille.