936 resultados para errors-in-variables model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0-an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The realistic representation of rainfall on the local scale in climate models remains a key challenge. Realism encompasses the full spatial and temporal structure of rainfall, and is a key indicator of model skill in representing the underlying processes. In particular, if rainfall is more realistic in a climate model, there is greater confidence in its projections of future change. In this study, the realism of rainfall in a very high-resolution (1.5 km) regional climate model (RCM) is compared to a coarser-resolution 12-km RCM. This is the first time a convection-permitting model has been run for an extended period (1989–2008) over a region of the United Kingdom, allowing the characteristics of rainfall to be evaluated in a climatological sense. In particular, the duration and spatial extent of hourly rainfall across the southern United Kingdom is examined, with a key focus on heavy rainfall. Rainfall in the 1.5-km RCM is found to be much more realistic than in the 12-km RCM. In the 12-km RCM, heavy rain events are not heavy enough, and tend to be too persistent and widespread. While the 1.5-km model does have a tendency for heavy rain to be too intense, it still gives a much better representation of its duration and spatial extent. Long-standing problems in climate models, such as the tendency for too much persistent light rain and errors in the diurnal cycle, are also considerably reduced in the 1.5-km RCM. Biases in the 12-km RCM appear to be linked to deficiencies in the representation of convection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diabatic processes can alter Rossby wave structure; consequently errors arising from model processes propagate downstream. However, the chaotic spread of forecasts from initial condition uncertainty renders it difficult to trace back from root mean square forecast errors to model errors. Here diagnostics unaffected by phase errors are used, enabling investigation of systematic errors in Rossby waves in winter-season forecasts from three operational centers. Tropopause sharpness adjacent to ridges decreases with forecast lead time. It depends strongly on model resolution, even though models are examined on a common grid. Rossby wave amplitude reduces with lead time up to about five days, consistent with under-representation of diabatic modification and transport of air from the lower troposphere into upper-tropospheric ridges, and with too weak humidity gradients across the tropopause. However, amplitude also decreases when resolution is decreased. Further work is necessary to isolate the contribution from errors in the representation of diabatic processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article shows how one can formulate the representation problem starting from Bayes’ theorem. The purpose of this article is to raise awareness of the formal solutions,so that approximations can be placed in a proper context. The representation errors appear in the likelihood, and the different possibilities for the representation of reality in model and observations are discussed, including nonlinear representation probability density functions. Specifically, the assumptions needed in the usual procedure to add a representation error covariance to the error covariance of the observations are discussed,and it is shown that, when several sub-grid observations are present, their mean still has a representation error ; socalled ‘superobbing’ does not resolve the issue. Connection is made to the off-line or on-line retrieval problem, providing a new simple proof of the equivalence of assimilating linear retrievals and original observations. Furthermore, it is shown how nonlinear retrievals can be assimilated without loss of information. Finally we discuss how errors in the observation operator model can be treated consistently in the Bayesian framework, connecting to previous work in this area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arctic flaw polynyas are considered to be highly productive areas for the formation of sea-ice throughout the winter season. Most estimates of sea-ice production are based on the surface energy balance equation and use global reanalyses as atmospheric forcing, which are too coarse to take into account the impact of polynyas on the atmosphere. Additional errors in the estimates of polynya ice production may result from the methods of calculating atmospheric energy fluxes and the assumption of a thin-ice distribution within polynyas. The present study uses simulations using the mesoscale weather prediction model of the Consortium for Small-scale Modelling (COSMO), where polynya area is prescribed from satellite data. The polynya area is either assumed to be ice-free or to be covered with thin ice of 10 cm. Simulations have been performed for two winter periods (2007/08 and 2008/09). When using a realistic thin-ice thickness of 10 cm, sea-ice production in Laptev polynyas amount to 30 km3 and 73 km3 for the winters 2007/08 and 2008/09, respectively. The higher turbulent energy fluxes of open-water polynyas result in a 50-70% increase in sea-ice production (49 km3 in 2007/08 and 123 km3 in 2008/09). Our results suggest that previous studies have overestimated ice production in the Laptev Sea.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new general Bayesian latent class model for evaluation of the performance of multiple diagnostic tests in situations in which no gold standard test exists based on a computationally intensive approach. The modeling represents an interesting and suitable alternative to models with complex structures that involve the general case of several conditionally independent diagnostic tests, covariates, and strata with different disease prevalences. The technique of stratifying the population according to different disease prevalence rates does not add further marked complexity to the modeling, but it makes the model more flexible and interpretable. To illustrate the general model proposed, we evaluate the performance of six diagnostic screening tests for Chagas disease considering some epidemiological variables. Serology at the time of donation (negative, positive, inconclusive) was considered as a factor of stratification in the model. The general model with stratification of the population performed better in comparison with its concurrents without stratification. The group formed by the testing laboratory Biomanguinhos FIOCRUZ-kit (c-ELISA and rec-ELISA) is the best option in the confirmation process by presenting false-negative rate of 0.0002% from the serial scheme. We are 100% sure that the donor is healthy when these two tests have negative results and he is chagasic when they have positive results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to assess the degree of multicollinearity and to identify the variables involved in linear dependence relations in additive-dominant models. Data of birth weight (n=141,567), yearling weight (n=58,124), and scrotal circumference (n=20,371) of Montana Tropical composite cattle were used. Diagnosis of multicollinearity was based on the variance inflation factor (VIF) and on the evaluation of the condition indexes and eigenvalues from the correlation matrix among explanatory variables. The first model studied (RM) included the fixed effect of dam age class at calving and the covariates associated to the direct and maternal additive and non-additive effects. The second model (R) included all the effects of the RM model except the maternal additive effects. Multicollinearity was detected in both models for all traits considered, with VIF values of 1.03 - 70.20 for RM and 1.03 - 60.70 for R. Collinearity increased with the increase of variables in the model and the decrease in the number of observations, and it was classified as weak, with condition index values between 10.00 and 26.77. In general, the variables associated with additive and non-additive effects were involved in multicollinearity, partially due to the natural connection between these covariables as fractions of the biological types in breed composition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Many HIV-infected patients on highly active antiretroviral therapy (HAART) experience metabolic complications including dyslipidaemia and insulin resistance, which may increase their coronary heart disease (CHD) risk. We developed a prognostic model for CHD tailored to the changes in risk factors observed in patients starting HAART. METHODS: Data from five cohort studies (British Regional Heart Study, Caerphilly and Speedwell Studies, Framingham Offspring Study, Whitehall II) on 13,100 men aged 40-70 and 114,443 years of follow up were used. CHD was defined as myocardial infarction or death from CHD. Model fit was assessed using the Akaike Information Criterion; generalizability across cohorts was examined using internal-external cross-validation. RESULTS: A parametric model based on the Gompertz distribution generalized best. Variables included in the model were systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, triglyceride, glucose, diabetes mellitus, body mass index and smoking status. Compared with patients not on HAART, the estimated CHD hazard ratio (HR) for patients on HAART was 1.46 (95% CI 1.15-1.86) for moderate and 2.48 (95% CI 1.76-3.51) for severe metabolic complications. CONCLUSIONS: The change in the risk of CHD in HIV-infected men starting HAART can be estimated based on typical changes in risk factors, assuming that HRs estimated using data from non-infected men are applicable to HIV-infected men. Based on this model the risk of CHD is likely to increase, but increases may often be modest, and could be offset by lifestyle changes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing a Model Interruption is a known human factor that contributes to errors and catastrophic events in healthcare as well as other high-risk industries. The landmark Institute of Medicine (IOM) report, To Err is Human, brought attention to the significance of preventable errors in medicine and suggested that interruptions could be a contributing factor. Previous studies of interruptions in healthcare did not offer a conceptual model by which to study interruptions. As a result of the serious consequences of interruptions investigated in other high-risk industries, there is a need to develop a model to describe, understand, explain, and predict interruptions and their consequences in healthcare. Therefore, the purpose of this study was to develop a model grounded in the literature and to use the model to describe and explain interruptions in healthcare. Specifically, this model would be used to describe and explain interruptions occurring in a Level One Trauma Center. A trauma center was chosen because this environment is characterized as intense, unpredictable, and interrupt-driven. The first step in developing the model began with a review of the literature which revealed that the concept interruption did not have a consistent definition in either the healthcare or non-healthcare literature. Walker and Avant’s method of concept analysis was used to clarify and define the concept. The analysis led to the identification of five defining attributes which include (1) a human experience, (2) an intrusion of a secondary, unplanned, and unexpected task, (3) discontinuity, (4) externally or internally initiated, and (5) situated within a context. However, before an interruption could commence, five conditions known as antecedents must occur. For an interruption to take place (1) an intent to interrupt is formed by the initiator, (2) a physical signal must pass a threshold test of detection by the recipient, (3) the sensory system of the recipient is stimulated to respond to the initiator, (4) an interruption task is presented to recipient, and (5) the interruption task is either accepted or rejected by v the recipient. An interruption was determined to be quantifiable by (1) the frequency of occurrence of an interruption, (2) the number of times the primary task has been suspended to perform an interrupting task, (3) the length of time the primary task has been suspended, and (4) the frequency of returning to the primary task or not returning to the primary task. As a result of the concept analysis, a definition of an interruption was derived from the literature. An interruption is defined as a break in the performance of a human activity initiated internal or external to the recipient and occurring within the context of a setting or location. This break results in the suspension of the initial task by initiating the performance of an unplanned task with the assumption that the initial task will be resumed. The definition is inclusive of all the defining attributes of an interruption. This is a standard definition that can be used by the healthcare industry. From the definition, a visual model of an interruption was developed. The model was used to describe and explain the interruptions recorded for an instrumental case study of physicians and registered nurses (RNs) working in a Level One Trauma Center. Five physicians were observed for a total of 29 hours, 31 minutes. Eight registered nurses were observed for a total of 40 hours 9 minutes. Observations were made on either the 0700–1500 or the 1500-2300 shift using the shadowing technique. Observations were recorded in the field note format. The field notes were analyzed by a hybrid method of categorizing activities and interruptions. The method was developed by using both a deductive a priori classification framework and by the inductive process utilizing line-byline coding and constant comparison as stated in Grounded Theory. The following categories were identified as relative to this study: Intended Recipient - the person to be interrupted Unintended Recipient - not the intended recipient of an interruption; i.e., receiving a phone call that was incorrectly dialed Indirect Recipient – the incidental recipient of an interruption; i.e., talking with another, thereby suspending the original activity Recipient Blocked – the intended recipient does not accept the interruption Recipient Delayed – the intended recipient postpones an interruption Self-interruption – a person, independent of another person, suspends one activity to perform another; i.e., while walking, stops abruptly and talks to another person Distraction – briefly disengaging from a task Organizational Design – the physical layout of the workspace that causes a disruption in workflow Artifacts Not Available – supplies and equipment that are not available in the workspace causing a disruption in workflow Initiator – a person who initiates an interruption Interruption by Organizational Design and Artifacts Not Available were identified as two new categories of interruption. These categories had not previously been cited in the literature. Analysis of the observations indicated that physicians were found to perform slightly fewer activities per hour when compared to RNs. This variance may be attributed to differing roles and responsibilities. Physicians were found to have more activities interrupted when compared to RNs. However, RNs experienced more interruptions per hour. Other people were determined to be the most commonly used medium through which to deliver an interruption. Additional mediums used to deliver an interruption vii included the telephone, pager, and one’s self. Both physicians and RNs were observed to resume an original interrupted activity more often than not. In most interruptions, both physicians and RNs performed only one or two interrupting activities before returning to the original interrupted activity. In conclusion the model was found to explain all interruptions observed during the study. However, the model will require an even more comprehensive study in order to establish its predictive value.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of applying DRAG methodology to the identification of the main factors of influence on the number of injury and fatal accidents occurring on Spain’s interurban network. Nineteen independent variables have been included in the model grouped together under ten categories: exposure, infrastructure, weather, drivers, economic variables, vehicle stock, surveillance, speed and legislative measures. Highly interesting conclusions can be reached from the results on the basis of the different effects of a single variable on each of the accident types according to severity. The greatest influence revealed by the results is exposure, which together with inexperienced drivers, speed and an ageing vehicle stock, have a negative effect, while the increased surveillance on roads, the improvement in the technological features of vehicles and the proportion of high capacity networks have a positive effect, since the results obtained show a significant drop in accidents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data on the occurrence of species are widely used to inform the design of reserve networks. These data contain commission errors (when a species is mistakenly thought to be present) and omission errors (when a species is mistakenly thought to be absent), and the rates of the two types of error are inversely related. Point locality data can minimize commission errors, but those obtained from museum collections are generally sparse, suffer from substantial spatial bias and contain large omission errors. Geographic ranges generate large commission errors because they assume homogenous species distributions. Predicted distribution data make explicit inferences on species occurrence and their commission and omission errors depend on model structure, on the omission of variables that determine species distribution and on data resolution. Omission errors lead to identifying networks of areas for conservation action that are smaller than required and centred on known species occurrences, thus affecting the comprehensiveness, representativeness and efficiency of selected areas. Commission errors lead to selecting areas not relevant to conservation, thus affecting the representativeness and adequacy of reserve networks. Conservation plans should include an estimation of commission and omission errors in underlying species data and explicitly use this information to influence conservation planning outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventional methods in horizontal drilling processes incorporate magnetic surveying techniques for determining the position and orientation of the bottom-hole assembly (BHA). Such means result in an increased weight of the drilling assembly, higher cost due to the use of non-magnetic collars necessary for the shielding of the magnetometers, and significant errors in the position of the drilling bit. A fiber-optic gyroscope (FOG) based inertial navigation system (INS) has been proposed as an alternative to magnetometer -based downhole surveying. The utilizing of a tactical-grade FOG based surveying system in the harsh downhole environment has been shown to be theoretically feasible, yielding a significant BHA position error reduction (less than 100m over a 2-h experiment). To limit the growing errors of the INS, an in-drilling alignment (IDA) method for the INS has been proposed. This article aims at describing a simple, pneumatics-based design of the IDA apparatus and its implementation downhole. A mathematical model of the setup is developed and tested with Bloodshed Dev-C++. The simulations demonstrate a simple, low cost and feasible IDA apparatus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2002 Mathematics Subject Classification: 62J05, 62G35.