20 resultados para Data mining, Business intelligence, Previsioni di mercato
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
The book presents the state of the art in machine learning algorithms (artificial neural networks of different architectures, support vector machines, etc.) as applied to the classification and mapping of spatially distributed environmental data. Basic geostatistical algorithms are presented as well. New trends in machine learning and their application to spatial data are given, and real case studies based on environmental and pollution data are carried out. The book provides a CD-ROM with the Machine Learning Office software, including sample sets of data, that will allow both students and researchers to put the concepts rapidly to practice.
Resumo:
Forensic intelligence is a distinct dimension of forensic science. Forensic intelligence processes have mostly been developed to address either a specific type of trace or a specific problem. Even though these empirical developments have led to successes, they are trace-specific in nature and contribute to the generation of silos which hamper the establishment of a more general and transversal model. Forensic intelligence has shown some important perspectives but more general developments are required to address persistent challenges. This will ensure the progress of the discipline as well as its widespread implementation in the future. This paper demonstrates that the description of forensic intelligence processes, their architectures, and the methods for building them can, at a certain level, be abstracted from the type of traces considered. A comparative analysis is made between two forensic intelligence approaches developed independently in Australia and in Europe regarding the monitoring of apparently very different kind of problems: illicit drugs and false identity documents. An inductive effort is pursued to identify similarities and to outline a general model. Besides breaking barriers between apparently separate fields of study in forensic science and intelligence, this transversal model would assist in defining forensic intelligence, its role and place in policing, and in identifying its contributions and limitations. The model will facilitate the paradigm shift from the current case-by-case reactive attitude towards a proactive approach by serving as a guideline for the use of forensic case data in an intelligence-led perspective. A follow-up article will specifically address issues related to comparison processes, decision points and organisational issues regarding forensic intelligence (part II).
Resumo:
The use by police services and inquiring agencies of forensic data in an intelligence perspective is still fragmentary and to some extent ignored. In order to increase the efficiency of criminal investigation to target illegal drug trafficking organisations and to provide valuable information about their methods, it is necessary to include and interpret objective drug analysis results already during the investigation phase. The value of visual, physical and chemical data of seized ecstasy tablets, as a support for criminal investigation on a strategic and tactical level has been investigated. In a first phase different characteristics of ecstasy tablets have been studied in order to define their relevance, variation, correlation and discriminating power in an intelligence perspective. During 5 years, over 1200 cases of ecstasy seizures (concerning about 150000 seized tablets) coming from different regions of Switzerland (City and Canton of Zurich, Cantons Ticino, Neuchâtel and Geneva) have been systematically recorded. This turned out to be a statistically representative database including large and small cases. During the second phase various comparison and clustering methods have been tested and evaluated, on the type and relevance of tablet characteristics, thus increasing knowledge about synthetic drugs, their manufacturing and trafficking. Finally analytical methodologies have been investigated and formalised, applying traditional intelligence methods. In this context classical tools, which are used in criminal analysis (like the I2 Analyst Notebook, I2 Ibase, ?) have been tested and adapted to address the specific need of forensic drug intelligence. The interpretation of these links provides valuable information about criminal organisations and their trafficking methods. In the final part of this thesis practical examples illustrate the use and value of such information.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.