771 resultados para Multi-relational data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Past and current climate change has already induced drastic biological changes. We need projections of how future climate change will further impact biological systems. Modeling is one approach to forecast future ecological impacts, but requires data for model parameterization. As collecting new data is costly, an alternative is to use the increasingly available georeferenced species occurrence and natural history databases. Here, we illustrate the use of such databases to assess climate change impacts on mountain flora. We show that these data can be used effectively to derive dynamic impact scenarios, suggesting upward migration of many species and possible extinctions when no suitable habitat is available at higher elevations. Systematically georeferencing all existing natural history collections data in mountain regions could allow a larger assessment of climate change impact on mountain ecosystems in Europe and elsewhere.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: Pharmacovigilance methods have advanced greatly during the last decades, making post-market drug assessment an essential drug evaluation component. These methods mainly rely on the use of spontaneous reporting systems and health information databases to collect expertise from huge amounts of real-world reports. The EU-ADR Web Platform was built to further facilitate accessing, monitoring and exploring these data, enabling an in-depth analysis of adverse drug reactions risks.METHODS: The EU-ADR Web Platform exploits the wealth of data collected within a large-scale European initiative, the EU-ADR project. Millions of electronic health records, provided by national health agencies, are mined for specific drug events, which are correlated with literature, protein and pathway data, resulting in a rich drug-event dataset. Next, advanced distributed computing methods are tailored to coordinate the execution of data-mining and statistical analysis tasks. This permits obtaining a ranked drug-event list, removing spurious entries and highlighting relationships with high risk potential.RESULTS: The EU-ADR Web Platform is an open workspace for the integrated analysis of pharmacovigilance datasets. Using this software, researchers can access a variety of tools provided by distinct partners in a single centralized environment. Besides performing standalone drug-event assessments, they can also control the pipeline for an improved batch analysis of custom datasets. Drug-event pairs can be substantiated and statistically analysed within the platform's innovative working environment.CONCLUSIONS: A pioneering workspace that helps in explaining the biological path of adverse drug reactions was developed within the EU-ADR project consortium. This tool, targeted at the pharmacovigilance community, is available online at https://bioinformatics.ua.pt/euadr/. Copyright © 2012 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de este artículo es introducir al lector español en algunos debates recientes de la comunidad de humanistas digitales de habla inglesa. En lugar de intentar definir la disciplina en términos absolutos, se ha optado por una aproximación diacrónica aunque se ha puesto el acento en algunos principios como la interdisciplinariedad y la construcción de modelos, valores como el acceso y el código abierto, y prácticas como la minería de datos y la colaboración.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aquesta exposició vol presentar breument el ventall d'eines disponibles, la terminologia utilitzada i, en general, el marc metodològic de l'estadística exploratoria i de l'analisi de dades, el paradigma de la disciplina.En el decurs dels darrers anys, la disciplina no ha estat pas capgirada, però de tota manera sí que cal una actualització permanent.S'han forjat i provat algunes eines gairebé només esbossades, han aparegut nous dominis d'aplicació. Cal precisar la relació amb els competidors i dinamics veïns (intel·ligencia artificial, xarxes neurals, Data Mining). La perspectiva que presento dels mètodes d'anàlisi de dades emana evidentment d'un punt de vista particular; altres punts de vista poden ser igualment vàlids

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The failure of current strategies to provide an explanation for controversial findings on the pattern of pathophysiological changes in Alzheimer's Disease (AD) motivates the necessity to develop new integrative approaches based on multi-modal neuroimaging data that captures various aspects of disease pathology. Previous studies using [18F]fluorodeoxyglucose positron emission tomography (FDG-PET) and structural magnetic resonance imaging (sMRI) report controversial results about time-line, spatial extent and magnitude of glucose hypometabolism and atrophy in AD that depend on clinical and demographic characteristics of the studied populations. Here, we provide and validate at a group level a generative anatomical model of glucose hypo-metabolism and atrophy progression in AD based on FDG-PET and sMRI data of 80 patients and 79 healthy controls to describe expected age and symptom severity related changes in AD relative to a baseline provided by healthy aging. We demonstrate a high level of anatomical accuracy for both modalities yielding strongly age- and symptom-severity- dependant glucose hypometabolism in temporal, parietal and precuneal regions and a more extensive network of atrophy in hippocampal, temporal, parietal, occipital and posterior caudate regions. The model suggests greater and more consistent changes in FDG-PET compared to sMRI at earlier and the inversion of this pattern at more advanced AD stages. Our model describes, integrates and predicts characteristic patterns of AD related pathology, uncontaminated by normal age effects, derived from multi-modal data. It further provides an integrative explanation for findings suggesting a dissociation between early- and late-onset AD. The generative model offers a basis for further development of individualized biomarkers allowing accurate early diagnosis and treatment evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, studies into the reasons for dropping out of higher education (including online education) have been undertaken with greater regularity, parallel to the rise in the relative weight of this type of education, compared with brick-and-mortar education. However, the work invested in characterising the students who drop out of education, compared with those who do not, appears not to have had the same relevance as that invested in the analysis of the causes. The definition of dropping out is very sensitive to the context. In this article, we reach a purely empirical definition of student dropping out, based on the probability of not continuing a specific academic programme following several consecutive semesters of "theoretical break". Dropping out should be properly defined before analysing its causes, as well as comparing the drop-out rates between the different online programmes, or between online and on-campus ones. Our results show that there are significant differences among programmes, depending on their theoretical extension, but not their domain of knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This master's thesis coversthe concepts of knowledge discovery, data mining and technology forecasting methods in telecommunications. It covers the various aspects of knowledge discoveryin data bases and discusses in detail the methods of data mining and technologyforecasting methods that are used in telecommunications. Main concern in the overall process of this thesis is to emphasize the methods that are being used in technology forecasting for telecommunications and data mining. It tries to answer to some extent to the question of do forecasts create a future? It also describes few difficulties that arise in technology forecasting. This thesis was done as part of my master's studies in Lappeenranta University of Technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Selective publication of studies, which is commonly called publication bias, is widely recognized. Over the years a new nomenclature for other types of bias related to non-publication or distortion related to the dissemination of research findings has been developed. However, several of these different biases are often still summarized by the term 'publication bias'. METHODS/DESIGN: As part of the OPEN Project (To Overcome failure to Publish nEgative fiNdings) we will conduct a systematic review with the following objectives:- To systematically review highly cited articles that focus on non-publication of studies and to present the various definitions of biases related to the dissemination of research findings contained in the articles identified.- To develop and discuss a new framework on nomenclature of various aspects of distortion in the dissemination process that leads to public availability of research findings in an international group of experts in the context of the OPEN Project.We will systematically search Web of Knowledge for highly cited articles that provide a definition of biases related to the dissemination of research findings. A specifically designed data extraction form will be developed and pilot-tested. Working in teams of two, we will independently extract relevant information from each eligible article.For the development of a new framework we will construct an initial table listing different levels and different hazards en route to making research findings public. An international group of experts will iteratively review the table and reflect on its content until no new insights emerge and consensus has been reached. DISCUSSION: Results are expected to be publicly available in mid-2013. This systematic review together with the results of other systematic reviews of the OPEN project will serve as a basis for the development of future policies and guidelines regarding the assessment and prevention of publication bias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peer-reviewed

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Un árbol de decisión es una forma gráfica y analítica de representar todos los eventos (sucesos) que pueden surgir a partir de una decisión asumida en cierto momento. Nos ayudan a tomar la decisión más"acertada", desde un punto de vista probabilístico, ante un abanico de posibles decisiones. Estos árboles permiten examinar los resultados y determinar visualmente cómo fluye el modelo. Los resultados visuales ayudan a buscar subgrupos específicos y relaciones que tal vez no encontraríamos con estadísticos más tradicionales. Los árboles de decisión son una técnica estadística para la segmentación, la estratificación, la predicción, la reducción de datos y el filtrado de variables, la identificación de interacciones, la fusión de categorías y la discretización de variables continuas. La función árboles de decisión (Tree) en SPSS crea árboles de clasificación y de decisión para identificar grupos, descubrir las relaciones entre grupos y predecir eventos futuros. Existen diferentes tipos de árbol: CHAID, CHAID exhaustivo, CRT y QUEST, según el que mejor se ajuste a nuestros datos.