983 resultados para Predictive modeling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical comparison of oil samples is an integral part of oil spill identification, which deals with the process of linking an oil spill with its source of origin. In current practice, a frequentist hypothesis test is often used to evaluate evidence in support of a match between a spill and a source sample. As frequentist tests are only able to evaluate evidence against a hypothesis but not in support of it, we argue that this leads to unsound statistical reasoning. Moreover, currently only verbal conclusions on a very coarse scale can be made about the match between two samples, whereas a finer quantitative assessment would often be preferred. To address these issues, we propose a Bayesian predictive approach for evaluating the similarity between the chemical compositions of two oil samples. We derive the underlying statistical model from some basic assumptions on modeling assays in analytical chemistry, and to further facilitate and improve numerical evaluations, we develop analytical expressions for the key elements of Bayesian inference for this model. The approach is illustrated with both simulated and real data and is shown to have appealing properties in comparison with both standard frequentist and Bayesian approaches

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During mitotic cell cycles, DNA experiences many types of endogenous and exogenous damaging agents that could potentially cause double strand breaks (DSB). In S. cerevisiae, DSBs are primarily repaired by mitotic recombination and as a result, could lead to loss-of-heterozygosity (LOH). Genetic recombination can happen in both meiosis and mitosis. While genome-wide distribution of meiotic recombination events has been intensively studied, mitotic recombination events have not been mapped unbiasedly throughout the genome until recently. Methods for selecting mitotic crossovers and mapping the positions of crossovers have recently been developed in our lab. Our current approach uses a diploid yeast strain that is heterozygous for about 55,000 SNPs, and employs SNP-Microarrays to map LOH events throughout the genome. These methods allow us to examine selected crossovers and unselected mitotic recombination events (crossover, noncrossover and BIR) at about 1 kb resolution across the genome. Using this method, we generated maps of spontaneous and UV-induced LOH events. In this study, we explore machine learning and variable selection techniques to build a predictive model for where the LOH events occur in the genome.

Randomly from the yeast genome, we simulated control tracts resembling the LOH tracts in terms of tract lengths and locations with respect to single-nucleotide-polymorphism positions. We then extracted roughly 1,100 features such as base compositions, histone modifications, presence of tandem repeats etc. and train classifiers to distinguish control tracts and LOH tracts. We found interesting features of good predictive values. We also found that with the current repertoire of features, the prediction is generally better for spontaneous LOH events than UV-induced LOH events.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1% sample drawn from a CMP design space with 250K points and up to 55x performance swings among different system configurations, our models predict performance with only 4-5% error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude. Copyright © 2006 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable.We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1–2% error. In an experimental study using the approach, training on 1% of a 250-K-point CMP design space allows our models to predict performance with only 4–5% error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Explores how machine learning techniques can be used to build effective student modeling systems with constrained development and operational overheads, by integrating top-down and bottom-up initiatives. Emphasizes feature-based modelling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Plant-soil interaction is central to human food production and ecosystem function. Thus, it is essential to not only understand, but also to develop predictive mathematical models which can be used to assess how climate and soil management practices will affect these interactions. Scope In this paper we review the current developments in structural and chemical imaging of rhizosphere processes within the context of multiscale mathematical image based modeling. We outline areas that need more research and areas which would benefit from more detailed understanding. Conclusions We conclude that the combination of structural and chemical imaging with modeling is an incredibly powerful tool which is fundamental for understanding how plant roots interact with soil. We emphasize the need for more researchers to be attracted to this area that is so fertile for future discoveries. Finally, model building must go hand in hand with experiments. In particular, there is a real need to integrate rhizosphere structural and chemical imaging with modeling for better understanding of the rhizosphere processes leading to models which explicitly account for pore scale processes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The growth parameters (growth rate, mu and lag time, lambda) of three different strains each of Salmonella enterica and Listeria monocytogenes in minimally processed lettuce (MPL) and their changes as a function of temperature were modeled. MPL were packed under modified atmosphere (5% O-2, 15% CO2 and 80% N-2), stored at 7-30 degrees C and samples collected at different time intervals were enumerated for S. enterica and L monocytogenes. Growth curves and equations describing the relationship between mu and lambda as a function of temperature were constructed using the DMFit Excel add-in and through linear regression, respectively. The predicted growth parameters for the pathogens observed in this study were compared to ComBase, Pathogen modeling program (PMP) and data from the literature. High R-2 values (0.97 and 0.93) were observed for average growth curves of different strains of pathogens grown on MPL Secondary models of mu and lambda for both pathogens followed a linear trend with high R2 values (>0.90). Root mean square error (RMSE) showed that the models obtained are accurate and suitable for modeling the growth of S. enterica and L monocytogenes in MP lettuce. The current study provides growth models for these foodborne pathogens that can be used in microbial risk assessment. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An array of substrates link the tryptic serine protease, kallikrein-related peptidase 14 (KLK14), to physiological functions including desquamation and activation of signaling molecules associated with inflammation and cancer. Recognition of protease cleavage sequences is driven by complementarity between exposed substrate motifs and the physicochemical signature of an enzyme's active site cleft. However, conventional substrate screening methods have generated conflicting subsite profiles for KLK14. This study utilizes a recently developed screening technique, the sparse matrix library, to identify five novel high-efficiency sequences for KLK14. The optimal sequence, YASR, was cleaved with higher efficiency (k(cat)/K(m)=3.81 ± 0.4 × 10(6) M(-1) s(-1)) than favored substrates from positional scanning and phage display by 2- and 10-fold, respectively. Binding site cooperativity was prominent among preferred sequences, which enabled optimal interaction at all subsites as indicated by predictive modeling of KLK14/substrate complexes. These simulations constitute the first molecular dynamics analysis of KLK14 and offer a structural rationale for the divergent subsite preferences evident between KLK14 and closely related KLKs, KLK4 and KLK5. Collectively, these findings highlight the importance of binding site cooperativity in protease substrate recognition, which has implications for discovery of optimal substrates and engineering highly effective protease inhibitors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Through a combinatorial approach involving experimental measurement and plasma modelling, it is shown that a high degree of control over diamond-like nanocarbon film sp3/sp2 ratio (and hence film properties) may be exercised, starting at the level of electrons (through modification of the plasma electron energy distribution function). Hydrogenated amorphous carbon nanoparticle films with high percentages of diamond-like bonds are grown using a middle-frequency (2 MHz) inductively coupled Ar + CH4 plasma. The sp3 fractions measured by X-ray photoelectron spectroscopy (XPS) and Raman spectroscopy in the thin films are explained qualitatively using sp3/sp2 ratios 1) derived from calculated sp3 and sp2 hybridized precursor species densities in a global plasma discharge model and 2) measured experimentally. It is shown that at high discharge power and lower CH4 concentrations, the sp3/sp2 fraction is higher. Our results suggest that a combination of predictive modeling and experimental studies is instrumental to achieve deterministically grown made-to-order diamond-like nanocarbons suitable for a variety of applications spanning from nano-magnetic resonance imaging to spin-flip quantum information devices. This deterministic approach can be extended to graphene, carbon nanotips, nanodiamond and other nanocarbon materials for a variety of applications

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Using advanced visualization techniques, a comprehensive visualization of all the stages of the self-organized growth of internetworked nanostructures on plasma-exposed surface has been made. Atomistic kinetic Monte Carlo simulation for the initial stage of deposition, with 3-D visualization of the whole system and half-tone visualization of the density field of the adsorbed atoms, makes it possible to implement a multiscale predictive modeling of the development of the nanoscale system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reliable calculations of the electron/ion energy losses in low-pressure thermally nonequilibrium low-temperature plasmas are indispensable for predictive modeling related to numerous applications of such discharges. The commonly used simplified approaches to calculation of electron/ion energy losses to the chamber walls use a number of simplifying assumptions that often do not account for the details of the prevailing electron energy distribution function (EEDF) and overestimate the contributions of the electron losses to the walls. By direct measurements of the EEDF and careful calculation of contributions of the plasma electrons in low-pressure inductively coupled plasmas, it is shown that the actual losses of kinetic energy of the electrons and ions strongly depend on the EEDF. It is revealed that the overestimates of the total electron/ion energy losses to the walls caused by improper assumptions about the prevailing EEDF and about the ability of the electrons to pass through the repulsive potential of the wall may lead to significant overestimates that are typically in the range between 9 and 32%. These results are particularly important for the development of power-saving strategies for operation of low-temperature, low-pressure gas discharges in diverse applications that require reasonably low power densities. © 2008 American Institute of Physics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

QTL mapping methods for complex traits are challenged by new developments in marker technology, phenotyping platforms, and breeding methods. In meeting these challenges, QTL mapping approaches will need to also acknowledge the central roles of QTL by environment interactions (QEI) and QTL by trait interactions in the expression of complex traits like yield. This paper presents an overview of mixed model QTL methodology that is suitable for many types of populations and that allows predictive modeling of QEI, both for environmental and developmental gradients. Attention is also given to multi-trait QTL models which are essential to interpret the genetic basis of trait correlations. Biophysical (crop growth) model simulations are proposed as a complement to statistical QTL mapping for the interpretation of the nature of QEI and to investigate better methods for the dissection of complex traits into component traits and their genetic controls.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Endometriosis is a common gynecological disease associated with pelvic pain and subfertility. We conducted a genome-wide association study (GWAS) in 3,194 individuals with surgically confirmed endometriosis (cases) and 7,060 controls from Australia and the UK. Polygenic predictive modeling showed significantly increased genetic loading among 1,364 cases with moderate to severe endometriosis. The strongest association signal was on 7p15.2 (rs12700667) for 'all' endometriosis (P = 2.6 x 10(-)(7), odds ratio (OR) = 1.22, 95% CI 1.13-1.32) and for moderate to severe disease (P = 1.5 x 10(-)(9), OR = 1.38, 95% CI 1.24-1.53). We replicated rs12700667 in an independent cohort from the United States of 2,392 self-reported, surgically confirmed endometriosis cases and 2,271 controls (P = 1.2 x 10(-)(3), OR = 1.17, 95% CI 1.06-1.28), resulting in a genome-wide significant P value of 1.4 x 10(-)(9) (OR = 1.20, 95% CI 1.13-1.27) for 'all' endometriosis in our combined datasets of 5,586 cases and 9,331 controls. rs12700667 is located in an intergenic region upstream of the plausible candidate genes NFE2L3 and HOXA10.