10 resultados para multivariate regression tree
em Universidade do Minho
Resumo:
Tese de Doutoramento em Medicina.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
Extreme value models are widely used in different areas. The Birnbaum–Saunders distribution is receiving considerable attention due to its physical arguments and its good properties. We propose a methodology based on extreme value Birnbaum–Saunders regression models, which includes model formulation, estimation, inference and checking. We further conduct a simulation study for evaluating its performance. A statistical analysis with real-world extreme value environmental data using the methodology is provided as illustration.
Resumo:
In longitudinal studies of disease, patients may experience several events through a follow-up period. In these studies, the sequentially ordered events are often of interest and lead to problems that have received much attention recently. Issues of interest include the estimation of bivariate survival, marginal distributions and the conditional distribution of gap times. In this work we consider the estimation of the survival function conditional to a previous event. Different nonparametric approaches will be considered for estimating these quantities, all based on the Kaplan-Meier estimator of the survival function. We explore the finite sample behavior of the estimators through simulations. The different methods proposed in this article are applied to a data set from a German Breast Cancer Study. The methods are used to obtain predictors for the conditional survival probabilities as well as to study the influence of recurrence in overall survival.
Resumo:
A high-resolution mtDNA phylogenetic tree allowed us to look backward in time to investigate purifying selection. Purifying selection was very strong in the last 2,500 years, continuously eliminating pathogenic mutations back until the end of the Younger Dryas (∼11,000 years ago), when a large population expansion likely relaxed selection pressure. This was preceded by a phase of stable selection until another relaxation occurred in the out-of-Africa migration. Demography and selection are closely related: expansions led to relaxation of selection and higher pathogenicity mutations significantly decreased the growth of descendants. The only detectible positive selection was the recurrence of highly pathogenic nonsynonymous mutations (m.3394T>C-m.3397A>G-m.3398T>C) at interior branches of the tree, preventing the formation of a dinucleotide STR (TATATA) in the MT-ND1 gene. At the most recent time scale in 124 mother-children transmissions, purifying selection was detectable through the loss of mtDNA variants with high predicted pathogenicity. A few haplogroup-defining sites were also heteroplasmic, agreeing with a significant propensity in 349 positions in the phylogenetic tree to revert back to the ancestral variant. This nonrandom mutation property explains the observation of heteroplasmic mutations at some haplogroup-defining sites in sequencing datasets, which may not indicate poor quality as has been claimed.
Resumo:
Tese de Doutoramento em Engenharia Civil.
Resumo:
BACKGROUND To validate a new practical Sepsis Severity Score for patients with complicated intra-abdominal infections (cIAIs) including the clinical conditions at the admission (severe sepsis/septic shock), the origin of the cIAIs, the delay in source control, the setting of acquisition and any risk factors such as age and immunosuppression. METHODS The WISS study (WSES cIAIs Score Study) is a multicenter observational study underwent in 132 medical institutions worldwide during a four-month study period (October 2014-February 2015). Four thousand five hundred thirty-three patients with a mean age of 51.2 years (range 18-99) were enrolled in the WISS study. RESULTS Univariate analysis has shown that all factors that were previously included in the WSES Sepsis Severity Score were highly statistically significant between those who died and those who survived (p < 0.0001). The multivariate logistic regression model was highly significant (p < 0.0001, R2 = 0.54) and showed that all these factors were independent in predicting mortality of sepsis. Receiver Operator Curve has shown that the WSES Severity Sepsis Score had an excellent prediction for mortality. A score above 5.5 was the best predictor of mortality having a sensitivity of 89.2 %, a specificity of 83.5 % and a positive likelihood ratio of 5.4. CONCLUSIONS WSES Sepsis Severity Score for patients with complicated Intra-abdominal infections can be used on global level. It has shown high sensitivity, specificity, and likelihood ratio that may help us in making clinical decisions.
Resumo:
Transforming growth factor beta (TGF-ß) plays an important role in carcinogenesis. Two polymorphisms in the TGF-ß1 gene (-509C/T and 869T/C) were described to influence susceptibility to gastric and breast cancers. The 869T/C polymorphism was also associated with overall survival in breast cancer patients. In the present study, we investigated the relevance of these TGF-ß1 polymorphism in glioma risk and prognosis. A case-control study that included 114 glioma patients and 138 cancer-free controls was performed. Single nucleotide polymorphisms (SNPs) were evaluated by polymerase chain reaction followed by restriction fragment length polymorphism (PCR-RFLP). Univariate and multivariate logistic regression analyses were used to calculate odds ratio (OR) and 95 % confidence intervals (95 % CI). The influence of TGF-ß1 -509C/T and 869T/C polymorphisms on glioma patient survival was evaluated by a Cox regression model adjusted for patients' age and sex and represented in Kaplan-Meier curves. Our results demonstrated that TGF-ß1 gene polymorphisms -509C/T and 869T/C are not significantly associated with glioma risk. Survival analyses showed that the homozygous -509TT genotype associates with longer overall survival of glioblastoma (GBM) patients when compared with patients carrying CC + CT genotypes (OR, 2.41; 95 % CI, 1.06-5.50; p = 0.036). In addition, the homozygous 869CC genotype is associated with increased overall survival of GBM patients when compared with 869TT + TC genotypes (OR, 2.62; 95 % CI, 1.11-6.17; p = 0.027). In conclusion, this study suggests that TGF-ß1 -509C/T and 869T/C polymorphisms are not significantly associated with risk for developing gliomas but may be relevant prognostic biomarkers in GBM patients.
Resumo:
The aim of this study was to determine if mycobacterial lineages affect infection risk, clustering, and disease progression among Mycobacterium tuberculosis cases in The Netherlands. Multivariate negative binomial regression models adjusted for patient-related factors and stratified by patient ethnicity were used to determine the association between phylogenetic lineages and infectivity (mean number of positive contacts around each patient) and clustering (as defined by number of secondary cases within 2 years after diagnosis of an index case sharing the same fingerprint) indices. An estimate of progression to disease by each risk factor was calculated as a bootstrapped risk ratio of the clustering index by the infectivity index. Compared to the Euro-American reference, Mycobacterium africanum showed significantly lower infectivity and clustering indices in the foreign-born population, while Mycobacterium bovis showed significantly lower infectivity and clustering indices in the native population. Significantly lower infectivity was also observed for the East African Indian lineage in the foreign-born population. Smear positivity was a significant risk factor for increased infectivity and increased clustering. Estimates of progression to disease were significantly associated with age, sputum-smear status, and behavioral risk factors, such as alcohol and intravenous drug abuse, but not with phylogenetic lineages. In conclusion, we found evidence of a bacteriological factor influencing indicators of a strain's transmissibility, namely, a decreased ability to infect and a lower clustering index in ancient phylogenetic lineages compared to their modern counterparts. Confirmation of these findings via follow-up studies using tuberculin skin test conversion data should have important implications on M. tuberculosis control efforts.
Resumo:
Natural mineral waters (still), effervescent natural mineral waters (sparkling) and aromatized waters with fruit-flavors (still or sparkling) are an emerging market. In this work, the capability of a potentiometric electronic tongue, comprised with lipid polymeric membranes, to quantitatively estimate routinely quality physicochemical parameters (pH and conductivity) as well as to qualitatively classify water samples according to the type of water was evaluated. The study showed that a linear discriminant model, based on 21 sensors selected by the simulated annealing algorithm, could correctly classify 100 % of the water samples (leave-one out cross-validation). This potential was further demonstrated by applying a repeated K-fold cross-validation (guaranteeing that at least 15 % of independent samples were only used for internal-validation) for which 96 % of correct classifications were attained. The satisfactory recognition performance of the E-tongue could be attributed to the pH, conductivity, sugars and organic acids contents of the studied waters, which turned out in significant differences of sweetness perception indexes and total acid flavor. Moreover, the E-tongue combined with multivariate linear regression models, based on sub-sets of sensors selected by the simulated annealing algorithm, could accurately estimate waters pH (25 sensors: R 2 equal to 0.99 and 0.97 for leave-one-out or repeated K-folds cross-validation) and conductivity (23 sensors: R 2 equal to 0.997 and 0.99 for leave-one-out or repeated K-folds cross-validation). So, the overall satisfactory results achieved, allow envisaging a potential future application of electronic tongue devices for bottled water analysis and classification.