956 resultados para Missing Covariates


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel method for remaining useful life prediction using the Elliptical Basis Function (EBF) network and a Markov chain. The EBF structure is trained by a modified Expectation-Maximization (EM) algorithm in order to take into account the missing covariate set. No explicit extrapolation is needed for internal covariates while a Markov chain is constructed to represent the evolution of external covariates in the study. The estimated external and the unknown internal covariates constitute an incomplete covariate set which are then used and analyzed by the EBF network to provide survival information of the asset. It is shown in the case study that the method slightly underestimates the remaining useful life of an asset which is a desirable result for early maintenance decision and resource planning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we study the survival cure rate model proposed by Yakovlev (1993) that are considered in a competing risk setting. Covariates are introduced for modeling the cure rate and we allow some covariates to have missing values. We consider only the cases by which the missing covariates are categorical and implement the EM algorithm via the method of weights for maximum likelihood estimation. We present a Monte Carlo simulation experiment to compare the properties of the estimators based on this method with those estimators under the complete case scenario. We also evaluate, in this experiment, the impact in the parameter estimates when we increase the proportion of immune and censored individuals among the not immune one. We demonstrate the proposed methodology with a real data set involving the time until the graduation for the undergraduate course of Statistics of the Universidade Federal do Rio Grande do Norte

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. Methods We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Results Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Conclusions Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern Engineering Asset Management (EAM) requires the accurate assessment of current and the prediction of future asset health condition. Suitable mathematical models that are capable of predicting Time-to-Failure (TTF) and the probability of failure in future time are essential. In traditional reliability models, the lifetime of assets is estimated using failure time data. However, in most real-life situations and industry applications, the lifetime of assets is influenced by different risk factors, which are called covariates. The fundamental notion in reliability theory is the failure time of a system and its covariates. These covariates change stochastically and may influence and/or indicate the failure time. Research shows that many statistical models have been developed to estimate the hazard of assets or individuals with covariates. An extensive amount of literature on hazard models with covariates (also termed covariate models), including theory and practical applications, has emerged. This paper is a state-of-the-art review of the existing literature on these covariate models in both the reliability and biomedical fields. One of the major purposes of this expository paper is to synthesise these models from both industrial reliability and biomedical fields and then contextually group them into non-parametric and semi-parametric models. Comments on their merits and limitations are also presented. Another main purpose of this paper is to comprehensively review and summarise the current research on the development of the covariate models so as to facilitate the application of more covariate modelling techniques into prognostics and asset health management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The art of storytelling is one of the oldest forms of creative discourse. Apart from finding stories, the most important job in television is the construction of stories to have a broad audience appeal. This first-hand review of Missing Persons Unit, hereafter referred to as MPU, a prime time program on the Nine Network in Australia with immense audience appeal, is an original work by the executive producer (development and series producer Series One, executive producer Series Two and Three) based on an overview of two-and-a-half years of production on three series. Through a case study approach, this Masters project explores how story is constructed into a television format. The thesis comprises two parts: the creative component (weighted 50%) is demonstrated through two programs of MPU (one program for evaluation) and the academic component through a written exegesis (50%). This case study aims to demonstrate how observational hybrid series such as MPU can be managed to quick turn-around schedules with precise skill sets that cut across a number of traditional genre styles. With the advent of radio and then television, storytelling found a home and a series of labels called genres to help place them in a schedule for listeners and viewers to choose. Over recent years, with the advent of digital technology and the rush to collect the masses of content required to feed the growing television slate, storytelling has often been replaced by story gathering. Today even in factual series where a clear story construct is important, third party ‘quick fix’ specialists are hired to shape raw content shot by a field team, who never put their own work together and may never come into the edit suite during a project. This thesis explores the art of storytelling in fast turn-around television. In particular it explores the layer cake approach used in the production process of MPU, that enables producers of fast turn-around television to shepherd their own stories from field through to post-production. While each new hybrid series will require its own particular sets of skills, the exploration of the genesis of MPU will demonstrate the building blocks required to successfully produce this type of factual series. This study is also intended as a ‘road map’ for producers who wish to develop similar series.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Manufacturing managers have a measurable mindset (or frame) that structures their response to the manufacturing environment. Most importantly, this frame represents a set of assumptions about the relative prominence of concepts in the manufacturing domains, about the nature of people, and about the sensemaking processes required to understand the nature of the manufacturing environment as seen through the eyes of manufacturing managers. This paper uses work in the area of text analysis and extends the scope of a methodology that has been approached from two different directions by Carley ( Journal of Organizational Behavior , 18 (51), 533-558, 1997) and Gephart ( Journal of Organizational Behavior , 18 (51), 583-622, 1997). This methodology is termed collocate analysis. Based on the analysis of transcripts of interviews of Australian manufacturing managers mind maps of the concepts used by these managers have been constructed. From an analysis of these mind maps it is argued that strategy plays a minor role in their thinking second only to the improvement domain, whereas design and related concepts play a dominant role in their day-to-day thinking.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many studies focused on the development of crash prediction models have resulted in aggregate crash prediction models to quantify the safety effects of geometric, traffic, and environmental factors on the expected number of total, fatal, injury, and/or property damage crashes at specific locations. Crash prediction models focused on predicting different crash types, however, have rarely been developed. Crash type models are useful for at least three reasons. The first is motivated by the need to identify sites that are high risk with respect to specific crash types but that may not be revealed through crash totals. Second, countermeasures are likely to affect only a subset of all crashes—usually called target crashes—and so examination of crash types will lead to improved ability to identify effective countermeasures. Finally, there is a priori reason to believe that different crash types (e.g., rear-end, angle, etc.) are associated with road geometry, the environment, and traffic variables in different ways and as a result justify the estimation of individual predictive models. The objectives of this paper are to (1) demonstrate that different crash types are associated to predictor variables in different ways (as theorized) and (2) show that estimation of crash type models may lead to greater insights regarding crash occurrence and countermeasure effectiveness. This paper first describes the estimation results of crash prediction models for angle, head-on, rear-end, sideswipe (same direction and opposite direction), and pedestrian-involved crash types. Serving as a basis for comparison, a crash prediction model is estimated for total crashes. Based on 837 motor vehicle crashes collected on two-lane rural intersections in the state of Georgia, six prediction models are estimated resulting in two Poisson (P) models and four NB (NB) models. The analysis reveals that factors such as the annual average daily traffic, the presence of turning lanes, and the number of driveways have a positive association with each type of crash, whereas median widths and the presence of lighting are negatively associated. For the best fitting models covariates are related to crash types in different ways, suggesting that crash types are associated with different precrash conditions and that modeling total crash frequency may not be helpful for identifying specific countermeasures.