945 resultados para multivariate regression tree
Resumo:
Our objective was to assess extrinsic influences upon childbirth. In a cohort of 1,826 days containing 17,417 childbirths among them 13,252 spontaneous labor admissions, we studied the influence of environment upon the high incidence of labor (defined by 75th percentile or higher), analyzed by logistic regression. The predictors of high labor admission included increases in outdoor temperature (odds ratio: 1.742, P = 0.045, 95%CI: 1.011 to 3.001), and decreases in atmospheric pressure (odds ratio: 1.269, P = 0.029, 95%CI: 1.055 to 1.483). In contrast, increases in tidal range were associated with a lower probability of high admission (odds ratio: 0.762, P = 0.030, 95%CI: 0.515 to 0.999). Lunar phase was not a predictor of high labor admission (P = 0.339). Using multivariate analysis, increases in temperature and decreases in atmospheric pressure predicted high labor admission, and increases of tidal range, as a measurement of the lunar gravitational force, predicted a lower probability of high admission.
Resumo:
This study extends the current knowledge regarding the use of plants for the passive accumulation of anthropogenic PAHs that are present in the atmospheric total suspended particles (TSP) in the tropics and sub-tropics. It is of major relevance because the anthropic emissions of TSP containing PAHs are significant in these regions, but their monitoring is still scarce. We compared the biomonitor efficiency of Lolium multiflorum 'Lema' and tropical tree species (Tibouchina pukka and Psidium guajava 'Paluma') that were growing in an intensely TSP-polluted site in Cubatao (SE Brazil), and established the species with the highest potential for alternative monitoring of PAHs. PAHs present in the TSP indicated that the region is impacted by various emission sources. L. multiflorum showed a greater efficiency for the accumulation of PAH compounds on their leaves than the tropical trees. The linear regression between the logBCF and logKoa revealed that L. multiflorum is an efficient biomonitor of the profile of light and heavy PAHs present in the particulate phase of the atmosphere during dry weather and mild temperatures. The grass should be used only for indicating the PAHs with higher molecular weight in warmer and wetter periods. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Background: In addition to the oncogenic human papillomavirus (HPV), several cofactors are needed in cervical carcinogenesis, but whether the HPV covariates associated with incident i) CIN1 are different from those of incident ii) CIN2 and iii) CIN3 needs further assessment. Objectives: To gain further insights into the true biological differences between CIN1, CIN2 and CIN3, we assessed HPV covariates associated with incident CIN1, CIN2, and CIN3. Study Design and Methods: HPV covariates associated with progression to CIN1, CIN2 and CIN3 were analysed in the combined cohort of the NIS (n = 3,187) and LAMS study (n = 12,114), using competing-risks regression models (in panel data) for baseline HR-HPV-positive women (n = 1,105), who represent a sub-cohort of all 1,865 women prospectively followed-up in these two studies. Results: Altogether, 90 (4.8%), 39 (2.1%) and 14 (1.4%) cases progressed to CIN1, CIN2, and CIN3, respectively. Among these baseline HR-HPV-positive women, the risk profiles of incident GIN I, CIN2 and CIN3 were unique in that completely different HPV covariates were associated with progression to CIN1, CIN2 and CIN3, irrespective which categories (non-progression, CIN1, CIN2, CIN3 or all) were used as competing-risks events in univariate and multivariate models. Conclusions: These data confirm our previous analysis based on multinomial regression models implicating that distinct covariates of HR-HPV are associated with progression to CIN1, CIN2 and CIN3. This emphasises true biological differences between the three grades of GIN, which revisits the concept of combining CIN2 with CIN3 or with CIN1 in histological classification or used as a common end-point, e.g., in HPV vaccine trials.
Resumo:
In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.
Resumo:
Questa tesi descrive alcuni studi di messa a punto di metodi di analisi fisici accoppiati con tecniche statistiche multivariate per valutare la qualità e l’autenticità di oli vegetali e prodotti caseari. L’applicazione di strumenti fisici permette di abbattere i costi ed i tempi necessari per le analisi classiche ed allo stesso tempo può fornire un insieme diverso di informazioni che possono riguardare tanto la qualità come l’autenticità di prodotti. Per il buon funzionamento di tali metodi è necessaria la costruzione di modelli statistici robusti che utilizzino set di dati correttamente raccolti e rappresentativi del campo di applicazione. In questo lavoro di tesi sono stati analizzati oli vegetali e alcune tipologie di formaggi (in particolare pecorini per due lavori di ricerca e Parmigiano-Reggiano per un altro). Sono stati utilizzati diversi strumenti di analisi (metodi fisici), in particolare la spettroscopia, l’analisi termica differenziale, il naso elettronico, oltre a metodiche separative tradizionali. I dati ottenuti dalle analisi sono stati trattati mediante diverse tecniche statistiche, soprattutto: minimi quadrati parziali; regressione lineare multipla ed analisi discriminante lineare.
Resumo:
Smoke spikes occurring during transient engine operation have detrimental health effects and increase fuel consumption by requiring more frequent regeneration of the diesel particulate filter. This paper proposes a decision tree approach to real-time detection of smoke spikes for control and on-board diagnostics purposes. A contemporary, electronically controlled heavy-duty diesel engine was used to investigate the deficiencies of smoke control based on the fuel-to-oxygen-ratio limit. With the aid of transient and steady state data analysis and empirical as well as dimensional modeling, it was shown that the fuel-to-oxygen ratio was not estimated correctly during the turbocharger lag period. This inaccuracy was attributed to the large manifold pressure ratios and low exhaust gas recirculation flows recorded during the turbocharger lag period, which meant that engine control module correlations for the exhaust gas recirculation flow and the volumetric efficiency had to be extrapolated. The engine control module correlations were based on steady state data and it was shown that, unless the turbocharger efficiency is artificially reduced, the large manifold pressure ratios observed during the turbocharger lag period cannot be achieved at steady state. Additionally, the cylinder-to-cylinder variation during this period were shown to be sufficiently significant to make the average fuel-to-oxygen ratio a poor predictor of the transient smoke emissions. The steady state data also showed higher smoke emissions with higher exhaust gas recirculation fractions at constant fuel-to-oxygen-ratio levels. This suggests that, even if the fuel-to-oxygen ratios were to be estimated accurately for each cylinder, they would still be ineffective as smoke limiters. A decision tree trained on snap throttle data and pruned with engineering knowledge was able to use the inaccurate engine control module estimates of the fuel-to-oxygen ratio together with information on the engine control module estimate of the exhaust gas recirculation fraction, the engine speed, and the manifold pressure ratio to predict 94% of all spikes occurring over the Federal Test Procedure cycle. The advantages of this non-parametric approach over other commonly used parametric empirical methods such as regression were described. An application of accurate smoke spike detection in which the injection pressure is increased at points with a high opacity to reduce the cumulative particulate matter emissions substantially with a minimum increase in the cumulative nitrogrn oxide emissions was illustrated with dimensional and empirical modeling.
Resumo:
Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies conducted at specific household locations as well as 15 ambient monitoring sites in the city. The models allow for both flexible, nonlinear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic particles, with some recording only outdoor concentrations of black or elemental carbon, some recording indoor concentrations of black carbon, and others recording both indoor and outdoor concentrations of black carbon. A joint model for outdoor and indoor exposure that specifies a spatially varying latent variable provides greater spatial coverage in the area of interest. We propose a penalised spline formation of the model that relates to generalised kringing of the latent traffic pollution variable and leads to a natural Bayesian Markov Chain Monte Carlo algorithm for model fitting. We propose methods that allow us to control the degress of freedom of the smoother in a Bayesian framework. Finally, we present results from an analysis that applies the model to data from summer and winter separately
Resumo:
The municipality of San Juan La Laguna, Guatemala is home to approximately 5,200 people and located on the western side of the Lake Atitlán caldera. Steep slopes surround all but the eastern side of San Juan. The Lake Atitlán watershed is susceptible to many natural hazards, but most predictable are the landslides that can occur annually with each rainy season, especially during high-intensity events. Hurricane Stan hit Guatemala in October 2005; the resulting flooding and landslides devastated the Atitlán region. Locations of landslide and non-landslide points were obtained from field observations and orthophotos taken following Hurricane Stan. This study used data from multiple attributes, at every landslide and non-landslide point, and applied different multivariate analyses to optimize a model for landslides prediction during high-intensity precipitation events like Hurricane Stan. The attributes considered in this study are: geology, geomorphology, distance to faults and streams, land use, slope, aspect, curvature, plan curvature, profile curvature and topographic wetness index. The attributes were pre-evaluated for their ability to predict landslides using four different attribute evaluators, all available in the open source data mining software Weka: filtered subset, information gain, gain ratio and chi-squared. Three multivariate algorithms (decision tree J48, logistic regression and BayesNet) were optimized for landslide prediction using different attributes. The following statistical parameters were used to evaluate model accuracy: precision, recall, F measure and area under the receiver operating characteristic (ROC) curve. The algorithm BayesNet yielded the most accurate model and was used to build a probability map of landslide initiation points. The probability map developed in this study was also compared to the results of a bivariate landslide susceptibility analysis conducted for the watershed, encompassing Lake Atitlán and San Juan. Landslides from Tropical Storm Agatha 2010 were used to independently validate this study’s multivariate model and the bivariate model. The ultimate aim of this study is to share the methodology and results with municipal contacts from the author's time as a U.S. Peace Corps volunteer, to facilitate more effective future landslide hazard planning and mitigation.
Resumo:
The development of coronary vasculopathy is the main determinant of long-term survival in cardiac transplantation. The identification of risk factors, therefore, seems necessary in order to identify possible treatment strategies. Ninety-five out of 397 patients, undergoing orthotopic cardiac transplantation from 10/1985 to 10/1992 were evaluated retrospectively on the basis of perioperative and postoperative variables including age, sex, diagnosis, previous operations, renal function, cholesterol levels, dosage of immunosuppressive drugs (cyclosporin A, azathioprine, steroids), incidence of rejection, treatment with calcium channel blockers at 3, 6, 12, and 18 months postoperatively. Coronary vasculopathy was assessed by annual angiography at 1 and 2 years postoperatively. After univariate analysis, data were evaluated by stepwise multiple logistic regression analysis. Coronary vasculopathy was assessed in 15 patients at 1 (16%), and in 23 patients (24%) at 2, years. On multivariate analysis, previous operations and the incidence of rejections were identified as significant risk factors (P < 0.05), whereas the underlying diagnosis had borderline significance (P = 0.058) for the development of graft coronary vasculopathy. In contrast, all other variables were not significant in our subset of patients investigated. We therefore conclude that the development of coronary vasculopathy in cardiac transplant patients mainly depends on the rejection process itself, aside from patient-dependent factors. Therapeutic measures, such as the administration of calcium channel blockers and regulation of lipid disorders, may therefore only reduce the progress of native atherosclerotic disease in the posttransplant setting.
Resumo:
Histopathologic tumor regression grades (TRGs) after neoadjuvant chemotherapy predict survival in different cancers. In bladder cancer, corresponding studies have not been conducted. Fifty-six patients with advanced invasive urothelial bladder cancer received neoadjuvant chemotherapy before cystectomy and lymphadenectomy. TRGs were defined as follows: TRG1: complete tumor regression; TRG2: >50% tumor regression; TRG3: 50% or less tumor regression. Separate TRGs were assigned for primary tumors and corresponding lymph nodes. The prognostic impact of these 2 TRGs, the highest (dominant) TRG per patient, and competing tumor features reflecting tumor regression (ypT/ypN stage, maximum diameter of the residual tumor) were determined. Tumor characteristics in initial transurethral resection of the bladder specimens were tested for response prediction. The frequency of TRGs 1, 2, and 3 in the primary tumors were n=16, n=19, and n=21; corresponding data from the lymph nodes were n=31, n=9, and n=16. Interobserver agreement in determination of the TRG was strong (κ=0.8). Univariately, all evaluated parameters were significantly (P≤0.001) related to overall survival; however, the segregation of the Kaplan-Meier curves was best for the dominant TRG. In multivariate analysis, only dominant TRG predicted overall survival independently (P=0.035). In transurethral resection specimens of the chemotherapy-naive bladder cancer, the only tumor feature with significant (P<0.03) predictive value for therapy response was a high proliferation rate. In conclusion, among all parameters reflecting tumor regression, the dominant TRG was the only independent risk factor. A favorable chemotherapy response is associated with a high proliferation rate in the initial chemotherapy-naive bladder cancer. This feature might help personalize neoadjuvant chemotherapy.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Objective: Processes occurring in the course of psychotherapy are characterized by the simple fact that they unfold in time and that the multiple factors engaged in change processes vary highly between individuals (idiographic phenomena). Previous research, however, has neglected the temporal perspective by its traditional focus on static phenomena, which were mainly assessed at the group level (nomothetic phenomena). To support a temporal approach, the authors introduce time-series panel analysis (TSPA), a statistical methodology explicitly focusing on the quantification of temporal, session-to-session aspects of change in psychotherapy. TSPA-models are initially built at the level of individuals and are subsequently aggregated at the group level, thus allowing the exploration of prototypical models. Method: TSPA is based on vector auto-regression (VAR), an extension of univariate auto-regression models to multivariate time-series data. The application of TSPA is demonstrated in a sample of 87 outpatient psychotherapy patients who were monitored by postsession questionnaires. Prototypical mechanisms of change were derived from the aggregation of individual multivariate models of psychotherapy process. In a 2nd step, the associations between mechanisms of change (TSPA) and pre- to postsymptom change were explored. Results: TSPA allowed a prototypical process pattern to be identified, where patient's alliance and self-efficacy were linked by a temporal feedback-loop. Furthermore, therapist's stability over time in both mastery and clarification interventions was positively associated with better outcomes. Conclusions: TSPA is a statistical tool that sheds new light on temporal mechanisms of change. Through this approach, clinicians may gain insight into prototypical patterns of change in psychotherapy.
Resumo:
BACKGROUND Recently, histopathological tumour regression, prevalence of signet ring cells, and localisation were reported as prognostic factors in neoadjuvantly treated oesophagogastric (junctional and gastric) cancer. This exploratory retrospective study analyses independent prognostic factors within a large patient cohort after preoperative chemotherapy including clinical and histopathological factors. METHODS In all, 850 patients presenting with oesophagogastric cancer staged cT3/4 Nany cM0/x were treated with neoadjuvant chemotherapy followed by resection in two academic centres. Patient data were documented in a prospective database and retrospectively analysed. RESULTS Of all factors prognostic on univariate analysis, only clinical response, complications, ypTNM stage, and R category were independently prognostic (P<0.01) on multivariate analysis. Tumour localisation and signet ring cells were independently prognostic only when investigator-dependent clinical response evaluation was excluded from the multivariate model. Histopathological tumour regression correlates with tumour grading, Laurén classification, clinical response, ypT, ypN, and R categories but was not identified as an independent prognostic factor. Within R0-resected patients only surgical complications and ypTNM stage were independent prognostic factors. CONCLUSIONS Only established prognostic factors like ypTNM stage, R category, and complications were identified as independent prognostic factors in resected patients after neoadjuvant chemotherapy. In contrast, histopathological tumour regression was not found as an independent prognostic marker.
Resumo:
PURPOSE To identify the influence of fixed prosthesis type on biologic and technical complication rates in the context of screw versus cement retention. Furthermore, a multivariate analysis was conducted to determine which factors, when considered together, influence the complication and failure rates of fixed implant-supported prostheses. MATERIALS AND METHODS Electronic searches of MEDLINE (PubMed), EMBASE, and the Cochrane Library were conducted. Selected inclusion and exclusion criteria were used to limit the search. Data were analyzed statistically with simple and multivariate random-effects Poisson regressions. RESULTS Seventy-three articles qualified for inclusion in the study. Screw-retained prostheses showed a tendency toward and significantly more technical complications than cemented prostheses with single crowns and fixed partial prostheses, respectively. Resin chipping and ceramic veneer chipping had high mean event rates, at 10.04 and 8.95 per 100 years, respectively, for full-arch screwed prostheses. For "all fixed prostheses" (prosthesis type not reported or not known), significantly fewer biologic and technical complications were seen with screw retention. Multivariate analysis revealed a significantly greater incidence of technical complications with cemented prostheses. Full-arch prostheses, cantilevered prostheses, and "all fixed prostheses" had significantly higher complication rates than single crowns. A significantly greater incidence of technical and biologic complications was seen with cemented prostheses. CONCLUSION Screw-retained fixed partial prostheses demonstrated a significantly higher rate of technical complications and screw-retained full-arch prostheses demonstrated a notably high rate of veneer chipping. When "all fixed prostheses" were considered, significantly higher rates of technical and biologic complications were seen for cement-retained prostheses. Multivariate Poisson regression analysis failed to show a significant difference between screw- and cement-retained prostheses with respect to the incidence of failure but demonstrated a higher rate of technical and biologic complications for cement-retained prostheses. The incidence of technical complications was more dependent upon prosthesis and retention type than prosthesis or abutment material.