704 resultados para Covariates
Resumo:
Modern Engineering Asset Management (EAM) requires the accurate assessment of current and the prediction of future asset health condition. Suitable mathematical models that are capable of predicting Time-to-Failure (TTF) and the probability of failure in future time are essential. In traditional reliability models, the lifetime of assets is estimated using failure time data. However, in most real-life situations and industry applications, the lifetime of assets is influenced by different risk factors, which are called covariates. The fundamental notion in reliability theory is the failure time of a system and its covariates. These covariates change stochastically and may influence and/or indicate the failure time. Research shows that many statistical models have been developed to estimate the hazard of assets or individuals with covariates. An extensive amount of literature on hazard models with covariates (also termed covariate models), including theory and practical applications, has emerged. This paper is a state-of-the-art review of the existing literature on these covariate models in both the reliability and biomedical fields. One of the major purposes of this expository paper is to synthesise these models from both industrial reliability and biomedical fields and then contextually group them into non-parametric and semi-parametric models. Comments on their merits and limitations are also presented. Another main purpose of this paper is to comprehensively review and summarise the current research on the development of the covariate models so as to facilitate the application of more covariate modelling techniques into prognostics and asset health management.
Resumo:
Many studies focused on the development of crash prediction models have resulted in aggregate crash prediction models to quantify the safety effects of geometric, traffic, and environmental factors on the expected number of total, fatal, injury, and/or property damage crashes at specific locations. Crash prediction models focused on predicting different crash types, however, have rarely been developed. Crash type models are useful for at least three reasons. The first is motivated by the need to identify sites that are high risk with respect to specific crash types but that may not be revealed through crash totals. Second, countermeasures are likely to affect only a subset of all crashes—usually called target crashes—and so examination of crash types will lead to improved ability to identify effective countermeasures. Finally, there is a priori reason to believe that different crash types (e.g., rear-end, angle, etc.) are associated with road geometry, the environment, and traffic variables in different ways and as a result justify the estimation of individual predictive models. The objectives of this paper are to (1) demonstrate that different crash types are associated to predictor variables in different ways (as theorized) and (2) show that estimation of crash type models may lead to greater insights regarding crash occurrence and countermeasure effectiveness. This paper first describes the estimation results of crash prediction models for angle, head-on, rear-end, sideswipe (same direction and opposite direction), and pedestrian-involved crash types. Serving as a basis for comparison, a crash prediction model is estimated for total crashes. Based on 837 motor vehicle crashes collected on two-lane rural intersections in the state of Georgia, six prediction models are estimated resulting in two Poisson (P) models and four NB (NB) models. The analysis reveals that factors such as the annual average daily traffic, the presence of turning lanes, and the number of driveways have a positive association with each type of crash, whereas median widths and the presence of lighting are negatively associated. For the best fitting models covariates are related to crash types in different ways, suggesting that crash types are associated with different precrash conditions and that modeling total crash frequency may not be helpful for identifying specific countermeasures.
Resumo:
This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.
Resumo:
The reliability analysis is crucial to reducing unexpected down time, severe failures and ever tightened maintenance budget of engineering assets. Hazard based reliability methods are of particular interest as hazard reflects the current health status of engineering assets and their imminent failure risks. Most existing hazard models were constructed using the statistical methods. However, these methods were established largely based on two assumptions: one is the assumption of baseline failure distributions being accurate to the population concerned and the other is the assumption of effects of covariates on hazards. These two assumptions may be difficult to achieve and therefore compromise the effectiveness of hazard models in the application. To address this issue, a non-linear hazard modelling approach is developed in this research using neural networks (NNs), resulting in neural network hazard models (NNHMs), to deal with limitations due to the two assumptions for statistical models. With the success of failure prevention effort, less failure history becomes available for reliability analysis. Involving condition data or covariates is a natural solution to this challenge. A critical issue for involving covariates in reliability analysis is that complete and consistent covariate data are often unavailable in reality due to inconsistent measuring frequencies of multiple covariates, sensor failure, and sparse intrusive measurements. This problem has not been studied adequately in current reliability applications. This research thus investigates such incomplete covariates problem in reliability analysis. Typical approaches to handling incomplete covariates have been studied to investigate their performance and effects on the reliability analysis results. Since these existing approaches could underestimate the variance in regressions and introduce extra uncertainties to reliability analysis, the developed NNHMs are extended to include handling incomplete covariates as an integral part. The extended versions of NNHMs have been validated using simulated bearing data and real data from a liquefied natural gas pump. The results demonstrate the new approach outperforms the typical incomplete covariates handling approaches. Another problem in reliability analysis is that future covariates of engineering assets are generally unavailable. In existing practices for multi-step reliability analysis, historical covariates were used to estimate the future covariates. Covariates of engineering assets, however, are often subject to substantial fluctuation due to the influence of both engineering degradation and changes in environmental settings. The commonly used covariate extrapolation methods thus would not be suitable because of the error accumulation and uncertainty propagation. To overcome this difficulty, instead of directly extrapolating covariate values, projection of covariate states is conducted in this research. The estimated covariate states and unknown covariate values in future running steps of assets constitute an incomplete covariate set which is then analysed by the extended NNHMs. A new assessment function is also proposed to evaluate risks of underestimated and overestimated reliability analysis results. A case study using field data from a paper and pulp mill has been conducted and it demonstrates that this new multi-step reliability analysis procedure is able to generate more accurate analysis results.
Resumo:
The method of generalized estimating equations (GEE) is a popular tool for analysing longitudinal (panel) data. Often, the covariates collected are time-dependent in nature, for example, age, relapse status, monthly income. When using GEE to analyse longitudinal data with time-dependent covariates, crucial assumptions about the covariates are necessary for valid inferences to be drawn. When those assumptions do not hold or cannot be verified, Pepe and Anderson (1994, Communications in Statistics, Simulations and Computation 23, 939–951) advocated using an independence working correlation assumption in the GEE model as a robust approach. However, using GEE with the independence correlation assumption may lead to significant efficiency loss (Fitzmaurice, 1995, Biometrics 51, 309–317). In this article, we propose a method that extracts additional information from the estimating equations that are excluded by the independence assumption. The method always includes the estimating equations under the independence assumption and the contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries. We apply the method to a longitudinal study of the health of a group of Filipino children.
Resumo:
MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.
Resumo:
In this work, kriging with covariates is used to model and map the spatial distribution of salinity measurements gathered by an autonomous underwater vehicle in a sea outfall monitoring campaign aiming to distinguish the effluent plume from the receiving waters and characterize its spatial variability in the vicinity of the discharge. Four different geostatistical linear models for salinity were assumed, where the distance to diffuser, the west-east positioning, and the south-north positioning were used as covariates. Sample variograms were fitted by the Mat`ern models using weighted least squares and maximum likelihood estimation methods as a way to detect eventual discrepancies. Typically, the maximum likelihood method estimated very low ranges which have limited the kriging process. So, at least for these data sets, weighted least squares showed to be the most appropriate estimation method for variogram fitting. The kriged maps show clearly the spatial variation of salinity, and it is possible to identify the effluent plume in the area studied. The results obtained show some guidelines for sewage monitoring if a geostatistical analysis of the data is in mind. It is important to treat properly the existence of anomalous values and to adopt a sampling strategy that includes transects parallel and perpendicular to the effluent dispersion.
Resumo:
Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.
Resumo:
OBJECTIVES: This contribution provides a unifying concept for meta-analysis integrating the handling of unobserved heterogeneity, study covariates, publication bias and study quality. It is important to consider these issues simultaneously to avoid the occurrence of artifacts, and a method for doing so is suggested here. METHODS: The approach is based upon the meta-likelihood in combination with a general linear nonparametric mixed model, which lays the ground for all inferential conclusions suggested here. RESULTS: The concept is illustrated at hand of a meta-analysis investigating the relationship of hormone replacement therapy and breast cancer. The phenomenon of interest has been investigated in many studies for a considerable time and different results were reported. In 1992 a meta-analysis by Sillero-Arenas et al. concluded a small, but significant overall effect of 1.06 on the relative risk scale. Using the meta-likelihood approach it is demonstrated here that this meta-analysis is due to considerable unobserved heterogeneity. Furthermore, it is shown that new methods are available to model this heterogeneity successfully. It is argued further to include available study covariates to explain this heterogeneity in the meta-analysis at hand. CONCLUSIONS: The topic of HRT and breast cancer has again very recently become an issue of public debate, when results of a large trial investigating the health effects of hormone replacement therapy were published indicating an increased risk for breast cancer (risk ratio of 1.26). Using an adequate regression model in the previously published meta-analysis an adjusted estimate of effect of 1.14 can be given which is considerably higher than the one published in the meta-analysis of Sillero-Arenas et al. In summary, it is hoped that the method suggested here contributes further to a good meta-analytic practice in public health and clinical disciplines.
Resumo:
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.