16 resultados para semi-autonomous information retrieval
em Université de Lausanne, Switzerland
Resumo:
The manipulation of DNA is routine practice in botanical research and has made a huge impact on plant breeding, biotechnology and biodiversity evaluation. DNA is easy to extract from most plant tissues and can be stored for long periods in DNA banks. Curation methods are well developed for other botanical resources such as herbaria, seed banks and botanic gardens, but procedures for the establishment and maintenance of DNA banks have not been well documented. This paper reviews the curation of DNA banks for the characterisation and utilisation of biodiversity and provides guidelines for DNA bank management. It surveys existing DNA banks and outlines their operation. It includes a review of plant DNA collection, preservation, isolation, storage, database management and exchange procedures. We stress that DNA banks require full integration with existing collections such as botanic gardens, herbaria and seed banks, and information retrieval systems that link such facilities, bioinformatic resources and other DNA banks. They also require efficient and well-regulated sample exchange procedures. Only with appropriate curation will maximum utilisation of DNA collections be achieved.
Resumo:
In this paper we propose a novel unsupervised approach to learning domain-specific ontologies from large open-domain text collections. The method is based on the joint exploitation of Semantic Domains and Super Sense Tagging for Information Retrieval tasks. Our approach is able to retrieve domain specific terms and concepts while associating them with a set of high level ontological types, named supersenses, providing flat ontologies characterized by very high accuracy and pertinence to the domain.
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
Abstract Textual autocorrelation is a broad and pervasive concept, referring to the similarity between nearby textual units: lexical repetitions along consecutive sentences, semantic association between neighbouring lexemes, persistence of discourse types (narrative, descriptive, dialogal...) and so on. Textual autocorrelation can also be negative, as illustrated by alternating phonological or morpho-syntactic categories, or the succession of word lengths. This contribution proposes a general Markov formalism for textual navigation, and inspired by spatial statistics. The formalism can express well-known constructs in textual data analysis, such as term-document matrices, references and hyperlinks navigation, (web) information retrieval, and in particular textual autocorrelation, as measured by Moran's I relatively to the exchange matrix associated to neighbourhoods of various possible types. Four case studies (word lengths alternation, lexical repulsion, parts of speech autocorrelation, and semantic autocorrelation) illustrate the theory. In particular, one observes a short-range repulsion between nouns together with a short-range attraction between verbs, both at the lexical and semantic levels. Résumé: Le concept d'autocorrélation textuelle, fort vaste, réfère à la similarité entre unités textuelles voisines: répétitions lexicales entre phrases successives, association sémantique entre lexèmes voisins, persistance du type de discours (narratif, descriptif, dialogal...) et ainsi de suite. L'autocorrélation textuelle peut être également négative, comme l'illustrent l'alternance entre les catégories phonologiques ou morpho-syntaxiques, ou la succession des longueurs de mots. Cette contribution propose un formalisme markovien général pour la navigation textuelle, inspiré par la statistique spatiale. Le formalisme est capable d'exprimer des constructions bien connues en analyse des données textuelles, telles que les matrices termes-documents, les références et la navigation par hyperliens, la recherche documentaire sur internet, et, en particulier, l'autocorélation textuelle, telle que mesurée par le I de Moran relatif à une matrice d'échange associée à des voisinages de différents types possibles. Quatre cas d'étude illustrent la théorie: alternance des longueurs de mots, répulsion lexicale, autocorrélation des catégories morpho-syntaxiques et autocorrélation sémantique. On observe en particulier une répulsion à courte portée entre les noms, ainsi qu'une attraction à courte portée entre les verbes, tant au niveau lexical que sémantique.
Resumo:
BACKGROUND: DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. DESCRIPTION: MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. CONCLUSION: We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via http://multiomics.sourceforge.net/.
Resumo:
Internet is increasingly used as a source of information on health issues and is probably a major source of patients' empowerment. This process is however limited by the frequently poor quality of web-based health information designed for consumers. A better diffusion of information about criteria defining the quality of the content of websites, and about useful methods designed for searching such needed information, could be particularly useful to patients and their relatives. A brief, six-items DISCERN version, characterized by a high specificity for detecting websites with good or very good content quality was recently developed. This tool could facilitate the identification of high-quality information on the web by patients and may improve the empowerment process initiated by the development of the health-related web.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
Abstract Since its creation, the Internet has permeated our daily life. The web is omnipresent for communication, research and organization. This exploitation has resulted in the rapid development of the Internet. Nowadays, the Internet is the biggest container of resources. Information databases such as Wikipedia, Dmoz and the open data available on the net are a great informational potentiality for mankind. The easy and free web access is one of the major feature characterizing the Internet culture. Ten years earlier, the web was completely dominated by English. Today, the web community is no longer only English speaking but it is becoming a genuinely multilingual community. The availability of content is intertwined with the availability of logical organizations (ontologies) for which multilinguality plays a fundamental role. In this work we introduce a very high-level logical organization fully based on semiotic assumptions. We thus present the theoretical foundations as well as the ontology itself, named Linguistic Meta-Model. The most important feature of Linguistic Meta-Model is its ability to support the representation of different knowledge sources developed according to different underlying semiotic theories. This is possible because mast knowledge representation schemata, either formal or informal, can be put into the context of the so-called semiotic triangle. In order to show the main characteristics of Linguistic Meta-Model from a practical paint of view, we developed VIKI (Virtual Intelligence for Knowledge Induction). VIKI is a work-in-progress system aiming at exploiting the Linguistic Meta-Model structure for knowledge expansion. It is a modular system in which each module accomplishes a natural language processing task, from terminology extraction to knowledge retrieval. VIKI is a supporting system to Linguistic Meta-Model and its main task is to give some empirical evidence regarding the use of Linguistic Meta-Model without claiming to be thorough.
Resumo:
Technological developments in the information society bring new challenges, both to the applicability and to the enforceability of the law. One major challenge is posed by new entities such as pseudonyms, avatars, and software agents that operate at an increasing distance from the physical persons "behind" them (the "principal"). In case of accidents or misbehavior, current laws require that the physical or legal principal behind the entity be found so that she can be held to account. This may be problematic if the linkability of the principal and the operating entity is questionable. In light of the ongoing developments in electronic agents, there is sufficient reason to conduct a review of the literature in order to more closely examine arguments for and against legal personhood for some nonhuman acting entities. This article also includes a discussion of alternative approaches to solving the "accountability gap."
Resumo:
This thesis examines how oversight bodies, as part of an ATI policy, contribute to the achievement of the policy's objectives. The aim of the thesis is to see how oversight bodies and the work they do affects the implementation of their respective ATI policies and thereby contributes to the objectives of those policies using a comparative case study approach. The thesis investigates how federal/central government level information commissioners in four jurisdictions - Germany, India, Scotland, and Switzerland - enforce their respective ATI policies, which tasks they carry out in addition to their enforcement duties, the challenges they face in their work and the ways they overcome these. Qualitative data were gathered from primary and secondary documents as well as in 37 semi-structured interviews with staff of the commissioners' offices, administrative officials whose job entails complying with ATI, people who have made ATI requests and appealed to their respective oversight body, and external experts who have studied ATI implementation in their particular jurisdiction. The thesis finds that while the aspect of an oversight body's formal independence that has the greatest impact on its work is resource control and that although the powers granted by law set the framework for ensuring that the administration is properly complying with the policy, the commissioner's leadership style - a component of informal independence - has more influence than formal attributes of independence in setting out how resources are obtained and used as well as how staff set priorities and utilize the powers they are granted by law. The conclusion, therefore, is that an ATI oversight body's ability to contribute to the achievement of the policy's objectives is a function of three main factors: a. commissioner's leadership style; b. adequacy of resources and degree of control the organization has over them; c. powers and the exercise of discretion in using them. In effect, the thesis argues that it is difficult to pinpoint the value of the formal powers set out for the oversight body in the ATI law, and that their decisions on whether and how to use them are more important than the presumed strength of the powers. It also claims that the choices made by the commissioners and their staff regarding priorities and use of powers are determined to a large extent by the adequacy of resources and the degree of control the organization has over those resources. In turn, how the head of the organization leads and manages the oversight body is crucial to both the adequacy of the organization's resources and the decisions made about the use of powers. Together, these three factors have a significant impact on the body's effectiveness in contributing to ATI objectives.
Resumo:
Background: Conventional magnetic resonance imaging (MRI) techniques are highly sensitive to detect multiple sclerosis (MS) plaques, enabling a quantitative assessment of inflammatory activity and lesion load. In quantitative analyses of focal lesions, manual or semi-automated segmentations have been widely used to compute the total number of lesions and the total lesion volume. These techniques, however, are both challenging and time-consuming, being also prone to intra-observer and inter-observer variability.Aim: To develop an automated approach to segment brain tissues and MS lesions from brain MRI images. The goal is to reduce the user interaction and to provide an objective tool that eliminates the inter- and intra-observer variability.Methods: Based on the recent methods developed by Souplet et al. and de Boer et al., we propose a novel pipeline which includes the following steps: bias correction, skull stripping, atlas registration, tissue classification, and lesion segmentation. After the initial pre-processing steps, a MRI scan is automatically segmented into 4 classes: white matter (WM), grey matter (GM), cerebrospinal fluid (CSF) and partial volume. An expectation maximisation method which fits a multivariate Gaussian mixture model to T1-w, T2-w and PD-w images is used for this purpose. Based on the obtained tissue masks and using the estimated GM mean and variance, we apply an intensity threshold to the FLAIR image, which provides the lesion segmentation. With the aim of improving this initial result, spatial information coming from the neighbouring tissue labels is used to refine the final lesion segmentation.Results:The experimental evaluation was performed using real data sets of 1.5T and the corresponding ground truth annotations provided by expert radiologists. The following values were obtained: 64% of true positive (TP) fraction, 80% of false positive (FP) fraction, and an average surface distance of 7.89 mm. The results of our approach were quantitatively compared to our implementations of the works of Souplet et al. and de Boer et al., obtaining higher TP and lower FP values.Conclusion: Promising MS lesion segmentation results have been obtained in terms of TP. However, the high number of FP which is still a well-known problem of all the automated MS lesion segmentation approaches has to be improved in order to use them for the standard clinical practice. Our future work will focus on tackling this issue.
Resumo:
Learning is predicted to affect manifold ecological and evolutionary processes, but the extent to which animals rely on learning in nature remains poorly known, especially for short-lived non-social invertebrates. This is in particular the case for Drosophila, a favourite laboratory system to study molecular mechanisms of learning. Here we tested whether Drosophila melanogaster use learned information to choose food while free-flying in a large greenhouse emulating the natural environment. In a series of experiments flies were first given an opportunity to learn which of two food odours was associated with good versus unpalatable taste; subsequently, their preference for the two odours was assessed with olfactory traps set up in the greenhouse. Flies that had experienced palatable apple-flavoured food and unpalatable orange-flavoured food were more likely to be attracted to the odour of apple than flies with the opposite experience. This was true both when the flies first learned in the laboratory and were then released and recaptured in the greenhouse, and when the learning occurred under free-flying conditions in the greenhouse. Furthermore, flies retained the memory of their experience while exploring the greenhouse overnight in the absence of focal odours, pointing to the involvement of consolidated memory. These results support the notion that even small, short lived insects which are not central-place foragers make use of learned cues in their natural environments.
Resumo:
This study explores biomonitoring communication with workers exposed to risks. Using a qualitative approach, semi-directive interviews were performed. Results show that occupational physicians and workers share some perceptions, but also point out communication gaps. Consequently, informed consent is not guaranteed. This article proposes some recommendations for occupational physicians' practices.