6 resultados para Bankruptcy prediction methods

em DigitalCommons@The Texas Medical Center


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives. Cardiovascular disease (CVD) including CVD secondary to diabetes type II, a significant health problem among Mexican American populations, originates in early childhood. This study seeks to determine risk factors available to the health practitioner that can identify the child at potential risk of developing CVD, thereby enabling early intervention. ^ Design. This is a secondary analysis of cross-sectional data of matched Mexican American parents and children selected from the HHANES, 1982–1984. ^ Methods. Parents at high risk for CVD were identified based on medical history, and clinical and physical findings. Factor analysis was performed on children's skinfold thicknesses, height, weight, and systolic and diastolic blood pressures, in order to produce a limited number of uncorrelated child CVD risk factors. Multiple regression analyses were then performed to determine other CVD markers associated with these Factors, independently for mothers and fathers. ^ Results. Factor analysis of children's measurements revealed three uncorrelated latent variables summarizing the children's CVD risk: Factor1: ‘Fatness’, Factor2: ‘Size and Maturity’, and Factor3: ‘Blood Pressure’, together accounting for the bulk of variation in children's measurements (86–89%). Univariate analyses showed that children from high CVD risk families did not differ from children of low risk families in occurrence of high blood pressure, overweight, biological maturity, acculturation score, or social and economic indicators. However, multiple regression using the factor scores (from factor analysis) as dependent variables, revealed that higher CVD risk in parents, was significantly associated with increased fatness and increased blood pressure in the children. Father's CVD risk status was associated with higher levels of body fat in his children and higher levels of blood pressure in sons. Mother's CVD risk status was associated with higher blood pressure levels in children, and occurrence of obesity in the mother associated with higher fatness levels in her children. ^ Conclusion. Occurrence of cardiovascular disease and its risk factors in parents of Mexican American children, may be used to identify children at potentially higher risk for developing CV disease in the future. Obesity in mothers appears to be an important marker for the development of higher levels of body fatness in children. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and purpose: Breast cancer continues to be a health problem for women, representing 28 percent of all female cancers and remaining one of the leading causes of death for women. Breast cancer incidence rates become substantial before the age of 50. After menopause, breast cancer incidence rates continue to increase with age creating a long-lasting source of concern (Harris et al., 1992). Mammography, a technique for the detection of breast tumors in their nonpalpable stage when they are most curable, has taken on considerable importance as a public health measure. The lifetime risk of breast cancer is approximately 1 in 9 and occurs over many decades. Recommendations are that screening be periodic in order to detect cancer at early stages. These recommendations, largely, are not followed. Not only are most women not getting regular mammograms, but this circumstance is particularly the case among older women where regular mammography has been proven to reduce mortality by approximately 30 percent. The purpose of this project was to increase our understanding of factors that are associated with stage of readiness to obtain subsequent mammograms. A secondary purpose of this research was to suggest further conceptual considerations toward the extension of the Transtheoretical Model (TTM) of behavior change to repeat screening mammography. ^ Methods. A sample (n = 1,222) of women 50 years and older in a large multi-specialty clinic in Houston, Texas was surveyed by mail questionnaire regarding their previous screening experience and stage of readiness to obtain repeat screening. A computerized database, maintained on all women who undergo mammography at the clinic, was used to identify women who are eligible for the project. The major statistical technique employed to select the significant variables and to examine the man and interaction effects of independent variables on dependent variables was polychotomous stepwise, logistic regression. A prediction model for each stage of readiness definition was estimated. The expected probabilities for stage of readiness were calculated to assess the magnitude and direction of significant predictors. ^ Results. Analysis showed that both ways of defining stage of readiness for obtaining a screening mammogram were associated with specific constructs, including decisional balance and processes of the change. ^ Conclusions. The results of the present study demonstrate that the TTM appears to translate to repeat mammography screening. Findings in the current study also support finding of previous studies that suggest that stage of readiness is associated with respondent decisional balance and the processes of change. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been hypothesized that results from the short term bioassays will ultimately provide information that will be useful for human health hazard assessment. Although toxicologic test systems have become increasingly refined, to date, no investigator has been able to provide qualitative or quantitative methods which would support the use of short term tests in this capacity.^ Historically, the validity of the short term tests have been assessed using the framework of the epidemiologic/medical screens. In this context, the results of the carcinogen (long term) bioassay is generally used as the standard. However, this approach is widely recognized as being biased and, because it employs qualitative data, cannot be used in the setting of priorities. In contrast, the goal of this research was to address the problem of evaluating the utility of the short term tests for hazard assessment using an alternative method of investigation.^ Chemical carcinogens were selected from the list of carcinogens published by the International Agency for Research on Carcinogens (IARC). Tumorigenicity and mutagenicity data on fifty-two chemicals were obtained from the Registry of Toxic Effects of Chemical Substances (RTECS) and were analyzed using a relative potency approach. The relative potency framework allows for the standardization of data "relative" to a reference compound. To avoid any bias associated with the choice of the reference compound, fourteen different compounds were used.^ The data were evaluated in a format which allowed for a comparison of the ranking of the mutagenic relative potencies of the compounds (as estimated using short term data) vs. the ranking of the tumorigenic relative potencies (as estimated from the chronic bioassays). The results were statistically significant (p $<$.05) for data standardized to thirteen of the fourteen reference compounds. Although this was a preliminary investigation, it offers evidence that the short term test systems may be of utility in ranking the hazards represented by chemicals which may be human carcinogens. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sepsis is a significant cause for multiple organ failure and death in the burn patient, yet identification in this population is confounded by chronic hypermetabolism and impaired immune function. The purpose of this study was twofold: 1) determine the ability of the systemic inflammatory response syndrome (SIRS) and American Burn Association (ABA) criteria to predict sepsis in the burn patient; and 2) develop a model representing the best combination of clinical predictors associated with sepsis in the same population. A retrospective, case-controlled, within-patient comparison of burn patients admitted to a single intensive care unit (ICU) was conducted for the period January 2005 to September 2010. Blood culture results were paired with clinical condition: "positive-sick"; "negative-sick", and "screening-not sick". Data were collected for the 72 hours prior to each blood culture. The most significant predictors were evaluated using logistic regression, Generalized Estimating Equations (GEE) and ROC area under the curve (AUC) analyses to assess model predictive ability. Bootstrapping methods were employed to evaluate potential model over-fitting. Fifty-nine subjects were included, representing 177 culture periods. SIRS criteria were not found to be associated with culture type, with an average of 98% of subjects meeting criteria in the 3 days prior. ABA sepsis criteria were significantly different among culture type only on the day prior (p = 0.004). The variables identified for the model included: heart rate>130 beats/min, mean blood pressure<60 mmHg, base deficit<-6 mEq/L, temperature>36°C, use of vasoactive medications, and glucose>150 mg/d1. The model was significant in predicting "positive culture-sick" and sepsis state, with AUC of 0.775 (p < 0.001) and 0.714 (p < .001), respectively; comparatively, the ABA criteria AUC was 0.619 (p = 0.028) and 0.597 (p = .035), respectively. SIRS criteria are not appropriate for identifying sepsis in the burn population. The ABA criteria perform better, but only for the day prior to positive blood culture results. The time period useful to diagnose sepsis using clinical criteria may be limited to 24 hours. A combination of predictors is superior to individual variable trends, yet algorithms or computer support will be necessary for the clinician to find such models useful. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^