942 resultados para Logistic regression methodology
Resumo:
The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^
Resumo:
Numerous expert elicitation methods have been suggested for generalised linear models (GLMs). This paper compares three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression. These methods were trialled on two experts in order to model the habitat suitability of the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The first elicitation approach is a geographically assisted indirect predictive method with a geographic information system (GIS) interface. The second approach is a predictive indirect method which uses an interactive graphical tool. The third method uses a questionnaire to elicit expert knowledge directly about the impact of a habitat variable on the response. Two variables (slope and aspect) are used to examine prior and posterior distributions of the three methods. The results indicate that there are some similarities and dissimilarities between the expert informed priors of the two experts formulated from the different approaches. The choice of elicitation method depends on the statistical knowledge of the expert, their mapping skills, time constraints, accessibility to experts and funding available. This trial reveals that expert knowledge can be important when modelling rare event data, such as threatened species, because experts can provide additional information that may not be represented in the dataset. However care must be taken with the way in which this information is elicited and formulated.
Resumo:
The benefits of applying tree-based methods to the purpose of modelling financial assets as opposed to linear factor analysis are increasingly being understood by market practitioners. Tree-based models such as CART (classification and regression trees) are particularly well suited to analysing stock market data which is noisy and often contains non-linear relationships and high-order interactions. CART was originally developed in the 1980s by medical researchers disheartened by the stringent assumptions applied by traditional regression analysis (Brieman et al. [1984]). In the intervening years, CART has been successfully applied to many areas of finance such as the classification of financial distress of firms (see Frydman, Altman and Kao [1985]), asset allocation (see Sorensen, Mezrich and Miller [1996]), equity style timing (see Kao and Shumaker [1999]) and stock selection (see Sorensen, Miller and Ooi [2000])...
Resumo:
This paper gives a new iterative algorithm for kernel logistic regression. It is based on the solution of a dual problem using ideas similar to those of the Sequential Minimal Optimization algorithm for Support Vector Machines. Asymptotic convergence of the algorithm is proved. Computational experiments show that the algorithm is robust and fast. The algorithmic ideas can also be used to give a fast dual algorithm for solving the optimization problem arising in the inner loop of Gaussian Process classifiers.
Resumo:
Elastic Net Regularizers have shown much promise in designing sparse classifiers for linear classification. In this work, we propose an alternating optimization approach to solve the dual problems of elastic net regularized linear classification Support Vector Machines (SVMs) and logistic regression (LR). One of the sub-problems turns out to be a simple projection. The other sub-problem can be solved using dual coordinate descent methods developed for non-sparse L2-regularized linear SVMs and LR, without altering their iteration complexity and convergence properties. Experiments on very large datasets indicate that the proposed dual coordinate descent - projection (DCD-P) methods are fast and achieve comparable generalization performance after the first pass through the data, with extremely sparse models.
Resumo:
Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.
Resumo:
Resumen tomado de la publicaci??n
Resumo:
Resumen tomado de la publicaci??n
Resumo:
A statistical technique for fault analysis in industrial printing is reported. The method specifically deals with binary data, for which the results of the production process fall into two categories, rejected or accepted. The method is referred to as logistic regression, and is capable of predicting future fault occurrences by the analysis of current measurements from machine parts sensors. Individual analysis of each type of fault can determine which parts of the plant have a significant influence on the occurrence of such faults; it is also possible to infer which measurable process parameters have no significant influence on the generation of these faults. Information derived from the analysis can be helpful in the operator's interpretation of the current state of the plant. Appropriate actions may then be taken to prevent potential faults from occurring. The algorithm is being implemented as part of an applied self-learning expert system.
Resumo:
The purpose of this article is to present a new method to predict the response variable of an observation in a new cluster for a multilevel logistic regression. The central idea is based on the empirical best estimator for the random effect. Two estimation methods for multilevel model are compared: penalized quasi-likelihood and Gauss-Hermite quadrature. The performance measures for the prediction of the probability for a new cluster observation of the multilevel logistic model in comparison with the usual logistic model are examined through simulations and an application.
Resumo:
Objective: To identify potential prognostic factors for pulmonary thromboembolism (PTE), establishing a mathematical model to predict the risk for fatal PTE and nonfatal PTE.Method: the reports on 4,813 consecutive autopsies performed from 1979 to 1998 in a Brazilian tertiary referral medical school were reviewed for a retrospective study. From the medical records and autopsy reports of the 512 patients found with macroscopically and/or microscopically,documented PTE, data on demographics, underlying diseases, and probable PTE site of origin were gathered and studied by multiple logistic regression. Thereafter, the jackknife method, a statistical cross-validation technique that uses the original study patients to validate a clinical prediction rule, was performed.Results: the autopsy rate was 50.2%, and PTE prevalence was 10.6%. In 212 cases, PTE was the main cause of death (fatal PTE). The independent variables selected by the regression significance criteria that were more likely to be associated with fatal PTE were age (odds ratio [OR], 1.02; 95% confidence interval [CI], 1.00 to 1.03), trauma (OR, 8.5; 95% CI, 2.20 to 32.81), right-sided cardiac thrombi (OR, 1.96; 95% CI, 1.02 to 3.77), pelvic vein thrombi (OR, 3.46; 95% CI, 1.19 to 10.05); those most likely to be associated with nonfatal PTE were systemic arterial hypertension (OR, 0.51; 95% CI, 0.33 to 0.80), pneumonia (OR, 0.46; 95% CI, 0.30 to 0.71), and sepsis (OR, 0.16; 95% CI, 0.06 to 0.40). The results obtained from the application of the equation in the 512 cases studied using logistic regression analysis suggest the range in which logit p > 0.336 favors the occurrence of fatal PTE, logit p < - 1.142 favors nonfatal PTE, and logit P with intermediate values is not conclusive. The cross-validation prediction misclassification rate was 25.6%, meaning that the prediction equation correctly classified the majority of the cases (74.4%).Conclusions: Although the usefulness of this method in everyday medical practice needs to be confirmed by a prospective study, for the time being our results suggest that concerning prevention, diagnosis, and treatment of PTE, strict attention should be given to those patients presenting the variables that are significant in the logistic regression model.