990 resultados para predictive modeling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Med prediktion avses att man skattar det framtida värdet på en observerbar storhet. Kännetecknande för det bayesianska paradigmet är att osäkerhet gällande okända storheter uttrycks i form av sannolikheter. En bayesiansk prediktiv modell är således en sannolikhetsfördelning över de möjliga värden som en observerbar, men ännu inte observerad storhet kan anta. I de artiklar som ingår i avhandlingen utvecklas metoder, vilka bl.a. tillämpas i analys av kromatografiska data i brottsutredningar. Med undantag för den första artikeln, bygger samtliga metoder på bayesiansk prediktiv modellering. I artiklarna betraktas i huvudsak tre olika typer av problem relaterade till kromatografiska data: kvantifiering, parvis matchning och klustring. I den första artikeln utvecklas en icke-parametrisk modell för mätfel av kromatografiska analyser av alkoholhalt i blodet. I den andra artikeln utvecklas en prediktiv inferensmetod för jämförelse av två stickprov. Metoden tillämpas i den tredje artik eln för jämförelse av oljeprover i syfte att kunna identifiera den förorenande källan i samband med oljeutsläpp. I den fjärde artikeln härleds en prediktiv modell för klustring av data av blandad diskret och kontinuerlig typ, vilken bl.a. tillämpas i klassificering av amfetaminprover med avseende på produktionsomgångar.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Plant-soil interaction is central to human food production and ecosystem function. Thus, it is essential to not only understand, but also to develop predictive mathematical models which can be used to assess how climate and soil management practices will affect these interactions. Scope In this paper we review the current developments in structural and chemical imaging of rhizosphere processes within the context of multiscale mathematical image based modeling. We outline areas that need more research and areas which would benefit from more detailed understanding. Conclusions We conclude that the combination of structural and chemical imaging with modeling is an incredibly powerful tool which is fundamental for understanding how plant roots interact with soil. We emphasize the need for more researchers to be attracted to this area that is so fertile for future discoveries. Finally, model building must go hand in hand with experiments. In particular, there is a real need to integrate rhizosphere structural and chemical imaging with modeling for better understanding of the rhizosphere processes leading to models which explicitly account for pore scale processes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The growth parameters (growth rate, mu and lag time, lambda) of three different strains each of Salmonella enterica and Listeria monocytogenes in minimally processed lettuce (MPL) and their changes as a function of temperature were modeled. MPL were packed under modified atmosphere (5% O-2, 15% CO2 and 80% N-2), stored at 7-30 degrees C and samples collected at different time intervals were enumerated for S. enterica and L monocytogenes. Growth curves and equations describing the relationship between mu and lambda as a function of temperature were constructed using the DMFit Excel add-in and through linear regression, respectively. The predicted growth parameters for the pathogens observed in this study were compared to ComBase, Pathogen modeling program (PMP) and data from the literature. High R-2 values (0.97 and 0.93) were observed for average growth curves of different strains of pathogens grown on MPL Secondary models of mu and lambda for both pathogens followed a linear trend with high R2 values (>0.90). Root mean square error (RMSE) showed that the models obtained are accurate and suitable for modeling the growth of S. enterica and L monocytogenes in MP lettuce. The current study provides growth models for these foodborne pathogens that can be used in microbial risk assessment. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spectral sensors are a wide class of devices that are extremely useful for detecting essential information of the environment and materials with high degree of selectivity. Recently, they have achieved high degrees of integration and low implementation cost to be suited for fast, small, and non-invasive monitoring systems. However, the useful information is hidden in spectra and it is difficult to decode. So, mathematical algorithms are needed to infer the value of the variables of interest from the acquired data. Between the different families of predictive modeling, Principal Component Analysis and the techniques stemmed from it can provide very good performances, as well as small computational and memory requirements. For these reasons, they allow the implementation of the prediction even in embedded and autonomous devices. In this thesis, I will present 4 practical applications of these algorithms to the prediction of different variables: moisture of soil, moisture of concrete, freshness of anchovies/sardines, and concentration of gasses. In all of these cases, the workflow will be the same. Initially, an acquisition campaign was performed to acquire both spectra and the variables of interest from samples. Then these data are used as input for the creation of the prediction models, to solve both classification and regression problems. From these models, an array of calibration coefficients is derived and used for the implementation of the prediction in an embedded system. The presented results will show that this workflow was successfully applied to very different scientific fields, obtaining autonomous and non-invasive devices able to predict the value of physical parameters of choice from new spectral acquisitions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study confirms that Australian isolates of Sclerotinia minor can produce fertile apothecia and further demonstrates that ascospores collected from these apothecia are pathogenic to sunflower (Helianthus annuus). Sunflower is a known host of the related fungus Sclerotinia sclerotiorum and is grown in some regions where S. minor is known to occur. Head rot symptoms were produced following inoculation with S. minor ascospores. Predictive modeling using CLIMEX software suggested that conditions suitable for carpogenic germination of S. minor probably occur in Australia particularly in southern regions. Carpogenic germination is probably a rare event in northern regions and, if it does occur, probably does not coincide with anthesis in sunflower crops, therefore allowing disease escape.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the recent past, hardly anyone could predict this course of GIS development. GIS is moving from desktop to cloud. Web 2.0 enabled people to input data into web. These data are becoming increasingly geolocated. Big amounts of data formed something that is called "Big Data". Scientists still don't know how to deal with it completely. Different Data Mining tools are used for trying to extract some useful information from this Big Data. In our study, we also deal with one part of these data - User Generated Geographic Content (UGGC). The Panoramio initiative allows people to upload photos and describe them with tags. These photos are geolocated, which means that they have exact location on the Earth's surface according to a certain spatial reference system. By using Data Mining tools, we are trying to answer if it is possible to extract land use information from Panoramio photo tags. Also, we tried to answer to what extent this information could be accurate. At the end, we compared different Data Mining methods in order to distinguish which one has the most suited performances for this kind of data, which is text. Our answers are quite encouraging. With more than 70% of accuracy, we proved that extracting land use information is possible to some extent. Also, we found Memory Based Reasoning (MBR) method the most suitable method for this kind of data in all cases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Considering genetic relatedness among species has long been argued as an important step toward measuring biological diversity more accurately, rather than relying solely on species richness. Some researchers have correlated measures of phylogenetic diversity and species richness across a series of sites and suggest that values of phylogenetic diversity do not differ enough from those of species richness to justify their inclusion in conservation planning. We compared predictions of species richness and 10 measures of phylogenetic diversity by creating distribution models for 168 individual species of a species-rich plant family, the Cape Proteaceae. When we used average amounts of land set aside for conservation to compare areas selected on the basis of species richness with areas selected on the basis of phylogenetic diversity, correlations between species richness and different measures of phylogenetic diversity varied considerably. Correlations between species richness and measures that were based on the length of phylogenetic tree branches and tree shape were weaker than those that were based on tree shape alone. Elevation explained up to 31% of the segregation of species rich versus phylogenetically rich areas. Given these results, the increased availability of molecular data, and the known ecological effect of phylogenetically rich communities, consideration of phylogenetic diversity in conservation decision making may be feasible and informative.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis studies the predictability of market switching and delisting events from OMX First North Nordic multilateral stock exchange by using financial statement information and market information from 2007 to 2012. This study was conducted by using a three stage process. In first stage relevant theoretical framework and initial variable pool were constructed. Then, explanatory analysis of the initial variable pool was done in order to further limit and identify relevant variables. The explanatory analysis was conducted by using self-organizing map methodology. In the third stage, the predictive modeling was carried out with random forests and support vector machine methodologies. It was found that the explanatory analysis was able to identify relevant variables. The results indicate that the market switching and delisting events can be predicted in some extent. The empirical results also support the usability of financial statement and market information in the prediction of market switching and delisting events.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

High resolution descriptions of plant distribution have utility for many ecological applications but are especially useful for predictive modeling of gene flow from transgenic crops. Difficulty lies in the extrapolation errors that occur when limited ground survey data are scaled up to the landscape or national level. This problem is epitomized by the wide confidence limits generated in a previous attempt to describe the national abundance of riverside Brassica rapa (a wild relative of cultivated rapeseed) across the United Kingdom. Here, we assess the value of airborne remote sensing to locate B. rapa over large areas and so reduce the need for extrapolation. We describe results from flights over the river Nene in England acquired using Airborne Thematic Mapper (ATM) and Compact Airborne Spectrographic Imager (CASI) imagery, together with ground truth data. It proved possible to detect 97% of flowering B. rapa on the basis of spectral profiles. This included all stands of plants that occupied >2m square (>5 plants), which were detected using single-pixel classification. It also included very small populations (<5 flowering plants, 1-2m square) that generated mixed pixels, which were detected using spectral unmixing. The high detection accuracy for flowering B. rapa was coupled with a rather large false positive rate (43%). The latter could be reduced by using the image detections to target fieldwork to confirm species identity, or by acquiring additional remote sensing data such as laser altimetry or multitemporal imagery.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work aims at combining the Chaos theory postulates and Artificial Neural Networks classification and predictive capability, in the field of financial time series prediction. Chaos theory, provides valuable qualitative and quantitative tools to decide on the predictability of a chaotic system. Quantitative measurements based on Chaos theory, are used, to decide a-priori whether a time series, or a portion of a time series is predictable, while Chaos theory based qualitative tools are used to provide further observations and analysis on the predictability, in cases where measurements provide negative answers. Phase space reconstruction is achieved by time delay embedding resulting in multiple embedded vectors. The cognitive approach suggested, is inspired by the capability of some chartists to predict the direction of an index by looking at the price time series. Thus, in this work, the calculation of the embedding dimension and the separation, in Takens‘ embedding theorem for phase space reconstruction, is not limited to False Nearest Neighbor, Differential Entropy or other specific method, rather, this work is interested in all embedding dimensions and separations that are regarded as different ways of looking at a time series by different chartists, based on their expectations. Prior to the prediction, the embedded vectors of the phase space are classified with Fuzzy-ART, then, for each class a back propagation Neural Network is trained to predict the last element of each vector, whereas all previous elements of a vector are used as features.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Ciências Cartográficas - FCT