164 resultados para Databases - Duplicate tuples


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results: To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions: We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells isone of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenoncontributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora ofdifferent transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify thedifferent types of reflected splicing variation. In this work, we present a general definition of the AS event along with anotation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assignsa specific ‘‘AS code’’ to every possible pattern of splicing variation. On the basis of this definition and the correspondingcodes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of ASevents in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversityacross genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—ofthe observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate andto compare the AS landscape of different reference annotation sets in human and in other metazoan species and found thatproportions of AS events change substantially depending on the annotation protocol, species-specific attributes, andcoding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conductspecific studies investigating the occurrence, impact, and regulation of AS.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Awareness of the negative effects of smoking on children's health prompted a decrease in the self-reporting of parental tobacco use in periodic surveys from most industrialized countries. Our aim is to assess changes between ETS exposure at the end of pregnancy and at 4 years of age determined by the parents' self-report and measurement of cotinine in age related biological matrices.Methods: The prospective birth cohort included 487 infants from Barcelona city (Spain). Mothers were asked about maternal and household smoking habit. Cord serum and children's urinary cotinine were analyzed in duplicate using a double antibody radioimmunoassay. Results: At 4 years of age, the median urinary cotinine level in children increased 1.4 or 3.5 times when father or mother smoked, respectively. Cotinine levels in children's urine statistically differentiated children from smoking mothers (Geometric Mean (GM) 19.7 ng/ml; 95% CI 16.83–23.01) and exposed homes (GM 7.1 ng/ml; 95% CI 5.61–8.99) compared with non-exposed homes (GM 4.5 ng/ml; 95% CI 3.71–5.48). Maternal self-reported ETS exposure in homes declined in the four year span between the two time periods from 42.2% to 31.0% (p < 0.01). Nevertheless, most of the children considered non-exposed by their mothers had detectable levels of cotinine above 1 ng/mL in their urine.Conclusion: We concluded that cotinine levels determined in cord blood and urine, respectively, were useful for categorizing the children exposed to smoking and showed that a certain increase in ETS exposure during the 4-year follow-up period occurred.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented. It is comprised of more than 600 individuals acquired simultaneously in three scenarios: 1) over the Internet, 2) in an office environment with desktop PC, and 3) in indoor/outdoor environments with mobile portable hardware. The three scenarios include a common part of audio/video data. Also, signature and fingerprint data have been acquired both with desktop PC and mobile portable hardware. Additionally, hand and iris data were acquired in the second scenario using desktop PC. Acquisition has been conducted by 11 European institutions. Additional features of the BioSecure Multimodal Database (BMDB) are: two acquisitionsessions, several sensors in certain modalities, balanced gender and age distributions, multimodal realistic scenarios with simple and quick tasks per modality, cross-European diversity, availability of demographic data, and compatibility with other multimodal databases. The novel acquisition conditions of the BMDB allow us to perform new challenging research and evaluation of eithermonomodal or multimodal biometric systems, as in the recent BioSecure Multimodal Evaluation campaign. A description of this campaign including baseline results of individual modalities from the new database is also given. The database is expected to beavailable for research purposes through the BioSecure Association during 2008.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study analyses the determinants of the rate of temporary employment in various OECD countries using both macro-level data drawn from the OECD and EUROSTAT databases, as well as micro-level data drawn from the 8th wave of the European Household Panel. Comparative analysis is set out to test different explanations originally formulated for the Spanish case. The evidence suggests that the overall distribution of temporary employment in advanced economies does not seem to be explicable by the characteristics of national productive structures. This evidence seems at odds with previous interpretations based on segmentation theories. As an alternative explanation, two types of supply-side factors are tested: crowding-out effects and educational gaps in the workforce. The former seems non significant, whilst the effects of the latter disappear after controlling for the levels of institutional protection in standard employment during the 1980s. Multivariate analysis shows that only this latter institutional variable, together with the degree of coordinated centralisation of the collective bargaining system, seem to have a significant impact on the distribution of temporary employment in the countries examined. On the basis of this observation, an explanation of the very high levels of temporary employment observed in Spain is proposed. This explanation is consistent with both country-specific and comparative evidence.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El objetivo principal del proyecto es que la consulta de datos de cuadro médico de una compañía de gestión de seguros de salud se realice en tiempo real y no en diferido como hasta ahora se viene haciendo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En aquest context de canvi econòmic i dels ritmes d’arribada dels fluxos migratoris, aquest treball aborda els efectes de la població immigrant sobre les empreses manufactureres localitzades en ciutats catalanes amb més de 1.000 habitants durant el període 2000-2008. El desenvolupament empíric posa especial èmfasi en la localització d’empreses i la població immigrada així com l’impacte que exerceix sobre el comportament de les empreses. Per tal de recollir les dades hem treballat amb tres bases de dades diferents: dades ofertes per Idescat, INE i Sistema de Anàlisis de Balances Ibéricos. En especial aquesta darrera base ha ofert informació a nivell individual de les empreses el qual ha requerit un procés de depuració complex i exhaustiu a través de programació estadística. En la literatura sobre els impactes econòmics dels immigrants predominen els treballs que han abordat un impacte sobre el mercat de treball, posant especial èmfasis en els diferencials salarials i la reducció de la bretxa salarial en funció del temps de residència dels immigrants. En canvi, en poques ocasions s’ha analitzat l’efecte de la immigració sobre el comportament de l’empresa. L’objectiu d’aquest treball és analitzar l’impacte dels entorns amb població immigrant a les ciutats catalanes i de forma més específica amb el creixement de les empreses locals i l’evolució dels seus nivells d’eficiència. Els objectius concrets que volem analitzar són els següents: i) mostrar les principals conclusions de la literatura ii) observar les pautes de localització d’immigrantsiii) mostrar les pautes de localització de les empreses iv) analitzar si les ciutats catalanes amb una major població immigrant presenten un impacte positiu o negatiu sobre el creixement de les empreses locals i l’evolució dels seus nivells d’eficiència a través d’eines economètriques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El projecte és el desenvolupament d'una base de dades de jugadors de bàsquet.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

En aquest projecte es realitza una comparativa de rendiment i utilització entre els diferents models de bases de dades orientades a columnes mitjançant la construcció i explotació d'un cub OLAP utilitzant la suite de BI Pentaho.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Disseny i implementació d'una base de dades relacional per a la gestió d'informació de jugadors de bàsquet. Implementació de les consultes més habituals del model de negoci. Disseny i implementació del mòdul estadístic per realitzar consultes en temps constant.