926 resultados para Genomic data integration
Resumo:
Porphyromonas gingivalis is a key periodontal pathogen which has been implicated in the etiology of chronic adult periodontitis. Our aim was to develop a protein based vaccine for the prevention and or treatment of this disease. We used a whole genome sequencing approach to identify potential vaccine candidates. From a genomic sequence, we selected 120 genes using a series of bioinformatics methods. The selected genes were cloned for expression in Escherichia coli and screened with P. gingivalis antisera before purification and testing in an animal model. Two of these recombinant proteins (PG32 and PG33) demonstrated significant protection in the animal model, while a number were reactive with various antisera. This process allows the rapid identification of vaccine candidates from genomic data. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
Somatic copy number aberrations (CNA) represent a mutation type encountered in the majority of cancer genomes. Here, we present the 2014 edition of arrayMap (http://www.arraymap.org), a publicly accessible collection of pre-processed oncogenomic array data sets and CNA profiles, representing a vast range of human malignancies. Since the initial release, we have enhanced this resource both in content and especially with regard to data mining support. The 2014 release of arrayMap contains more than 64,000 genomic array data sets, representing about 250 tumor diagnoses. Data sets included in arrayMap have been assembled from public repositories as well as additional resources, and integrated by applying custom processing pipelines. Online tools have been upgraded for a more flexible array data visualization, including options for processing user provided, non-public data sets. Data integration has been improved by mapping to multiple editions of the human reference genome, with the majority of the data now being available for the UCSC hg18 as well as GRCh37 versions. The large amount of tumor CNA data in arrayMap can be freely downloaded by users to promote data mining projects, and to explore special events such as chromothripsis-like genome patterns.
Resumo:
In this paper we review the impact that the availability of the Schistosoma mansoni genome sequence and annotation has had on schistosomiasis research. Easy access to the genomic information is important and several types of data are currently being integrated, such as proteomics, microarray and polymorphic loci. Access to the genome annotation and powerful means of extracting information are major resources to the research community.
Resumo:
La recent revolució en les tècniques de generació de dades genòmiques ha portat a una situació de creixement exponencial de la quantitat de dades generades i fa més necessari que mai el treball en la optimització de la gestió i maneig d'aquesta informació. En aquest treball s'han atacat tres vessants del problema: la disseminació de la informació, la integració de dades de diverses fonts i finalment la seva visualització. Basant-nos en el Sistema d'Anotacions Distribuides, DAS, hem creat un aplicatiu per a la creació automatitzada de noves fonts de dades en format estandaritzat i accessible programàticament a partir de fitxers de dades simples. Aquest progrtamari, easyDAS, està en funcionament a l'Institut Europeu de Bioinformàtica. Aquest sistema facilita i encoratja la compartició i disseminació de dades genòmiques en formats usables. jsDAS és una llibreria client de DAS que permet incorporar dades DAS en qualsevol aplicatiu web de manera senzilla i ràpida. Aprofitant els avantatges que ofereix DAS és capaç d'integrar dades de múltiples fonts de manera coherent i robusta. GenExp és el prototip de navegador genòmic basat en web altament interactiu i que facilita l'exploració dels genomes en temps real. És capaç d'integrar dades de quansevol font DAS i crear-ne una representació en client usant els últims avenços en tecnologies web.
Resumo:
Gene expression patterns are a key feature in understanding gene function, notably in development. Comparing gene expression patterns between animals is a major step in the study of gene function as well as of animal evolution. It also provides a link between genes and phenotypes. Thus we have developed Bgee, a database designed to compare expression patterns between animals, by implementing ontologies describing anatomies and developmental stages of species, and then designing homology relationships between anatomies and comparison criteria between developmental stages. To define homology relationships between anatomical features we have developed the software Homolonto, which uses a modified ontology alignment approach to propose homology relationships between ontologies. Bgee then uses these aligned ontologies, onto which heterogeneous expression data types are mapped. These already include microarrays and ESTs.
Resumo:
Relationships between porosity and hydraulic conductivity tend to be strongly scale- and site-dependent and are thus very difficult to establish. As a result, hydraulic conductivity distributions inferred from geophysically derived porosity models must be calibrated using some measurement of aquifer response. This type of calibration is potentially very valuable as it may allow for transport predictions within the considered hydrological unit at locations where only geophysical measurements are available, thus reducing the number of well tests required and thereby the costs of management and remediation. Here, we explore this concept through a series of numerical experiments. Considering the case of porosity characterization in saturated heterogeneous aquifers using crosshole ground-penetrating radar and borehole porosity log data, we use tracer test measurements to calibrate a relationship between porosity and hydraulic conductivity that allows the best prediction of the observed hydrological behavior. To examine the validity and effectiveness of the obtained relationship, we examine its performance at alternate locations not used in the calibration procedure. Our results indicate that this methodology allows us to obtain remarkably reliable hydrological predictions throughout the considered hydrological unit based on the geophysical data only. This was also found to be the case when significant uncertainty was considered in the underlying relationship between porosity and hydraulic conductivity.
Resumo:
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.
Resumo:
The present paper advocates for the creation of a federated, hybrid database in the cloud, integrating law data from all available public sources in one single open access system - adding, in the process, relevant meta-data to the indexed documents, including the identification of social and semantic entities and the relationships between them, using linked open data techniques and standards such as RDF. Examples of potential benefits and applications of this approach are also provided, including, among others, experiences from of our previous research, in which data integration, graph databases and social and semantic networks analysis were used to identify power relations, litigation dynamics and cross-references patterns both intra and inter-institutionally, covering most of the World international economic courts.
Resumo:
Owing to recent advances in genomic technologies, personalized oncology is poised to fundamentally alter cancer therapy. In this paradigm, the mutational and transcriptional profiles of tumors are assessed, and personalized treatments are designed based on the specific molecular abnormalities relevant to each patient's cancer. To date, such approaches have yielded impressive clinical responses in some patients. However, a major limitation of this strategy has also been revealed: the vast majority of tumor mutations are not targetable by current pharmacological approaches. Immunotherapy offers a promising alternative to exploit tumor mutations as targets for clinical intervention. Mutated proteins can give rise to novel antigens (called neoantigens) that are recognized with high specificity by patient T cells. Indeed, neoantigen-specific T cells have been shown to underlie clinical responses to many standard treatments and immunotherapeutic interventions. Moreover, studies in mouse models targeting neoantigens, and early results from clinical trials, have established proof of concept for personalized immunotherapies targeting next-generation sequencing identified neoantigens. Here, we review basic immunological principles related to T-cell recognition of neoantigens, and we examine recent studies that use genomic data to design personalized immunotherapies. We discuss the opportunities and challenges that lie ahead on the road to improving patient outcomes by incorporating immunotherapy into the paradigm of personalized oncology.
Resumo:
Because of the increased availability of different kind of business intelligence technologies and tools it can be easy to fall in illusion that new technologies will automatically solve the problems of data management and reporting of the company. The management is not only about management of technology but also the management of processes and people. This thesis is focusing more into traditional data management and performance management of production processes which both can be seen as a requirement for long lasting development. Also some of the operative BI solutions are considered in the ideal state of reporting system. The objectives of this study are to examine what requirements effective performance management of production processes have for data management and reporting of the company and to see how they are effecting on the efficiency of it. The research is executed as a theoretical literary research about the subjects and as a qualitative case study about reporting development project of Finnsugar Ltd. The case study is examined through theoretical frameworks and by the active participant observation. To get a better picture about the ideal state of reporting system simple investment calculations are performed. According to the results of the research, requirements for effective performance management of production processes are automation in the collection of data, integration of operative databases, usage of efficient data management technologies like ETL (Extract, Transform, Load) processes, data warehouse (DW) and Online Analytical Processing (OLAP) and efficient management of processes, data and roles.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Human endogenous retroviruses (HERVs) are the result of ancient germ cell infections of human germ cells by exogenous retroviruses. HERVs belong to the long terminal repeat (LTR) group of retrotransposons that comprise ~8% of the human genome. The majority of the HERVs documented have been truncated and/or incurred lethal mutations and no longer encode functional genes; however a very small number of HERVs seem to maintain functional in making new copies by retrotranspositon as suggested by the identification of a handful of polymorphic HERV insertions in human populations. The objectives of this study were to identify novel insertion of HERVs via analysis of personal genomic data and survey the polymorphism levels of new and known HERV insertions in the human genome. Specifically, this study involves the experimental validation of polymorphic HERV insertion candidates predicted by personal genome-based computation prediction and survey the polymorphism level within the human population based on a set of 30 diverse human DNA samples. Based on computational analysis of a limited number of personal genome sequences, PCR genotyping aided in the identification of 15 dimorphic, 2 trimorphic and 5 fixed full-length HERV-K insertions not previously investigated. These results suggest that the proliferation rate of HERVKs, perhaps also other ERVs, in the human genome may be much higher than we previously appreciated and the recently inserted HERVs exhibit a high level of instability. Throughout this study we have observed the frequent presence of additional forms of genotypes for these HERV insertions, and we propose for the first time the establishment of new genotype reporting nomenclature to reflect all possible combinations of the pre-integration site, solo-LTR and full-length HERV alleles.