902 resultados para Information storage and retrieval systems -- Geography
Resumo:
The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.
Resumo:
OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
Resumo:
Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Resumo:
Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms. Database URL: http://eurodia.vital-it.ch.
Resumo:
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
The use of self-calibrating techniques in parallel magnetic resonance imaging eliminates the need for coil sensitivity calibration scans and avoids potential mismatches between calibration scans and subsequent accelerated acquisitions (e.g., as a result of patient motion). Most examples of self-calibrating Cartesian parallel imaging techniques have required the use of modified k-space trajectories that are densely sampled at the center and more sparsely sampled in the periphery. However, spiral and radial trajectories offer inherent self-calibrating characteristics because of their densely sampled center. At no additional cost in acquisition time and with no modification in scanning protocols, in vivo coil sensitivity maps may be extracted from the densely sampled central region of k-space. This work demonstrates the feasibility of self-calibrated spiral and radial parallel imaging using a previously described iterative non-Cartesian sensitivity encoding algorithm.
Resumo:
We propose a method for brain atlas deformation in the presence of large space-occupying tumors, based on an a priori model of lesion growth that assumes radial expansion of the lesion from its starting point. Our approach involves three steps. First, an affine registration brings the atlas and the patient into global correspondence. Then, the seeding of a synthetic tumor into the brain atlas provides a template for the lesion. The last step is the deformation of the seeded atlas, combining a method derived from optical flow principles and a model of lesion growth. Results show that a good registration is performed and that the method can be applied to automatic segmentation of structures and substructures in brains with gross deformation, with important medical applications in neurosurgery, radiosurgery, and radiotherapy.
Resumo:
L'article présente les étapes de la mise en place d'une veille bibliographique (ou veille scientifique) thématique effectuée conjointement depuis 2005 par 4 institutions francophones du domaine de la santé au travail : l'INRS (France), l'IRSST (Québec), l'IST (Suisse) et l'UCL (Belgique).La thématique suivie est celle de la surveillance biologique de l'exposition aux produits chimiques en milieu de travail. Les données recueillies et mises en forme par les documentalistes servent aux chercheurs spécialistes du sujet non seulement pour suivre les nouveautés du domaine, mais aussi pour documenter des cours et mettre à jour des guides de surveillance biologique. Les différentes étapes de l'approche méthodologique du projet sont décrites : le choix des bases de données à interroger et la mise au point de la stratégie de recherche, la mise en place d'une procédure de partage des tâches pour toutes les étapes du processus de veille qui se répètent à chaque mise à jour (interrogation, création de bases de données avec le logiciel Reference Manager, mise en forme et indexation des références, création et mise à disposition des partenaires des bases de données consolidées au fil du temps avec tous les articles analysés), les moyens administratifs, humains et techniques d'échange de fichiers et les essais pour élargir la veille à la surveillance de pages Web sélectionnées.Un bilan chiffré des six années de la veille est également donné.L'information récoltée et analysée durant les deux dernières années par les partenaires du projet fera l'objet d'un second article axé sur les principales tendances de la thématique choisie.
Resumo:
The Complete Arabidopsis Transcriptome Micro Array (CATMA) database contains gene sequence tag (GST) and gene model sequences for over 70% of the predicted genes in the Arabidopsis thaliana genome as well as primer sequences for GST amplification and a wide range of supplementary information. All CATMA GST sequences are specific to the gene for which they were designed, and all gene models were predicted from a complete reannotation of the genome using uniform parameters. The database is searchable by sequence name, sequence homology or direct SQL query, and is available through the CATMA website at http://www.catma.org/.
Resumo:
Les deux premières parties de cet article parues précédemment ont présenté la méthodologie ainsi que les premiers éléments du bilan réalisé sur la période allant de 2009 à 2012 de la veille bibliographique sur la surveillance biologique de l'exposition aux produits chimiques en milieu de travail (SBEPC MT) mise en place par un réseau francophone multidisciplinaire.
Resumo:
PURPOSE: To improve the tag persistence throughout the whole cardiac cycle by providing a constant tag-contrast throughout all the cardiac phases when using balanced steady-state free precession (bSSFP) imaging. MATERIALS AND METHODS: The flip angles of the imaging radiofrequency pulses were optimized to compensate for the tagging contrast-to-noise ratio (Tag-CNR) fading at later cardiac phases in bSSFP imaging. Complementary spatial modulation of magnetization (CSPAMM) tagging was implemented to improve the Tag-CNR. Numerical simulations were performed to examine the behavior of the Tag-CNR with the proposed method, and to compare the resulting Tag-CNR with that obtained from the more commonly used spoiled gradient echo (SPGR) imaging. A gel phantom, as well as five healthy human volunteers, were scanned on a 1.5T scanner using bSSFP imaging with and without the proposed technique. The phantom was also scanned with SPGR imaging. RESULTS: With the proposed technique, the Tag-CNR remained almost constant during the whole cardiac cycle. Using bSSFP imaging, the Tag-CNR was about double that of SPGR. CONCLUSION: The tag persistence was significantly improved when the proposed method was applied, with better Tag-CNR during the diastolic cardiac phase. The improved Tag-CNR will support automated tagging analysis and quantification methods.