797 resultados para Data Mining
Resumo:
En este artículo se propone el análisis de las interacciones entre usuarios de Twitter, tanto lo que se genera alrededor de un usuario concreto como el análisis de un hashtag dado durante un periodo de tiempo establecido.
Resumo:
Past and current climate change has already induced drastic biological changes. We need projections of how future climate change will further impact biological systems. Modeling is one approach to forecast future ecological impacts, but requires data for model parameterization. As collecting new data is costly, an alternative is to use the increasingly available georeferenced species occurrence and natural history databases. Here, we illustrate the use of such databases to assess climate change impacts on mountain flora. We show that these data can be used effectively to derive dynamic impact scenarios, suggesting upward migration of many species and possible extinctions when no suitable habitat is available at higher elevations. Systematically georeferencing all existing natural history collections data in mountain regions could allow a larger assessment of climate change impact on mountain ecosystems in Europe and elsewhere.
Resumo:
PURPOSE: Pharmacovigilance methods have advanced greatly during the last decades, making post-market drug assessment an essential drug evaluation component. These methods mainly rely on the use of spontaneous reporting systems and health information databases to collect expertise from huge amounts of real-world reports. The EU-ADR Web Platform was built to further facilitate accessing, monitoring and exploring these data, enabling an in-depth analysis of adverse drug reactions risks.METHODS: The EU-ADR Web Platform exploits the wealth of data collected within a large-scale European initiative, the EU-ADR project. Millions of electronic health records, provided by national health agencies, are mined for specific drug events, which are correlated with literature, protein and pathway data, resulting in a rich drug-event dataset. Next, advanced distributed computing methods are tailored to coordinate the execution of data-mining and statistical analysis tasks. This permits obtaining a ranked drug-event list, removing spurious entries and highlighting relationships with high risk potential.RESULTS: The EU-ADR Web Platform is an open workspace for the integrated analysis of pharmacovigilance datasets. Using this software, researchers can access a variety of tools provided by distinct partners in a single centralized environment. Besides performing standalone drug-event assessments, they can also control the pipeline for an improved batch analysis of custom datasets. Drug-event pairs can be substantiated and statistically analysed within the platform's innovative working environment.CONCLUSIONS: A pioneering workspace that helps in explaining the biological path of adverse drug reactions was developed within the EU-ADR project consortium. This tool, targeted at the pharmacovigilance community, is available online at https://bioinformatics.ua.pt/euadr/. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
El objetivo de este artículo es introducir al lector español en algunos debates recientes de la comunidad de humanistas digitales de habla inglesa. En lugar de intentar definir la disciplina en términos absolutos, se ha optado por una aproximación diacrónica aunque se ha puesto el acento en algunos principios como la interdisciplinariedad y la construcción de modelos, valores como el acceso y el código abierto, y prácticas como la minería de datos y la colaboración.
Resumo:
Aquesta exposició vol presentar breument el ventall d'eines disponibles, la terminologia utilitzada i, en general, el marc metodològic de l'estadística exploratoria i de l'analisi de dades, el paradigma de la disciplina.En el decurs dels darrers anys, la disciplina no ha estat pas capgirada, però de tota manera sí que cal una actualització permanent.S'han forjat i provat algunes eines gairebé només esbossades, han aparegut nous dominis d'aplicació. Cal precisar la relació amb els competidors i dinamics veïns (intel·ligencia artificial, xarxes neurals, Data Mining). La perspectiva que presento dels mètodes d'anàlisi de dades emana evidentment d'un punt de vista particular; altres punts de vista poden ser igualment vàlids
Resumo:
The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.
Resumo:
The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.
Resumo:
In recent years, studies into the reasons for dropping out of higher education (including online education) have been undertaken with greater regularity, parallel to the rise in the relative weight of this type of education, compared with brick-and-mortar education. However, the work invested in characterising the students who drop out of education, compared with those who do not, appears not to have had the same relevance as that invested in the analysis of the causes. The definition of dropping out is very sensitive to the context. In this article, we reach a purely empirical definition of student dropping out, based on the probability of not continuing a specific academic programme following several consecutive semesters of "theoretical break". Dropping out should be properly defined before analysing its causes, as well as comparing the drop-out rates between the different online programmes, or between online and on-campus ones. Our results show that there are significant differences among programmes, depending on their theoretical extension, but not their domain of knowledge.
Resumo:
This master's thesis coversthe concepts of knowledge discovery, data mining and technology forecasting methods in telecommunications. It covers the various aspects of knowledge discoveryin data bases and discusses in detail the methods of data mining and technologyforecasting methods that are used in telecommunications. Main concern in the overall process of this thesis is to emphasize the methods that are being used in technology forecasting for telecommunications and data mining. It tries to answer to some extent to the question of do forecasts create a future? It also describes few difficulties that arise in technology forecasting. This thesis was done as part of my master's studies in Lappeenranta University of Technology.
Resumo:
BACKGROUND: Selective publication of studies, which is commonly called publication bias, is widely recognized. Over the years a new nomenclature for other types of bias related to non-publication or distortion related to the dissemination of research findings has been developed. However, several of these different biases are often still summarized by the term 'publication bias'. METHODS/DESIGN: As part of the OPEN Project (To Overcome failure to Publish nEgative fiNdings) we will conduct a systematic review with the following objectives:- To systematically review highly cited articles that focus on non-publication of studies and to present the various definitions of biases related to the dissemination of research findings contained in the articles identified.- To develop and discuss a new framework on nomenclature of various aspects of distortion in the dissemination process that leads to public availability of research findings in an international group of experts in the context of the OPEN Project.We will systematically search Web of Knowledge for highly cited articles that provide a definition of biases related to the dissemination of research findings. A specifically designed data extraction form will be developed and pilot-tested. Working in teams of two, we will independently extract relevant information from each eligible article.For the development of a new framework we will construct an initial table listing different levels and different hazards en route to making research findings public. An international group of experts will iteratively review the table and reflect on its content until no new insights emerge and consensus has been reached. DISCUSSION: Results are expected to be publicly available in mid-2013. This systematic review together with the results of other systematic reviews of the OPEN project will serve as a basis for the development of future policies and guidelines regarding the assessment and prevention of publication bias.
Resumo:
DDM is a framework that combines intelligent agents and artificial intelligence traditional algorithms such as classifiers. The central idea of this project is to create a multi-agent system that allows to compare different views into a single one.
Resumo:
Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.
Resumo:
Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Resumo:
The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.
Resumo:
Peer-reviewed