978 resultados para NIRS. Plum. Multivariate calibration. Variables selection
Resumo:
Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.
Resumo:
Tsuga canadensis (eastern hemlock) is a highly shade-tolerant, late-successional, and long-lived conifer species found throughout eastern North America. It is most often found in pure or nearly pure stands, because highly acidic and nutrient poor forest floor conditions are thought to favor T. canadensis regeneration while simultaneously limiting the establishment of some hardwood species with greater nutrient requirements. Once a common species, T. canadensis is currently experiencing widescale declines across its range. The hemlock woolly adelgid (Adelges tsugae) is decimating the population across its eastern distribution. Across the Upper Great Lakes region, where the adelgid is currently being held at bay by cold winter temperatures, T. canadensis has been experiencing failures in regeneration attributed, in part, to herbivory by white-tailed deer (Odocoileus virginianus). Deer utilize T. canadensis stands as winter habitat in areas of high snow depth. Tsuga canadensis, once a major component of these forests, currently exists at just a fraction of its pre-settlement abundance due to historic logging and contemporary forest management practices, and what remains is found in small remnant patches surrounded by second- and third-growth deciduous forests. The deer population across the region, however, is likely double that of pre-European settlement times. In this dissertation I explore the relationship between white-tailed deer use of T. canadensis as winter habitat and the effect this use is having on regeneration and forest succession. For this research I quantified stand composition and structure and abiotic variables of elevation and snow depth in 39 randomly selected T. canadensis stands from across the western Upper Peninsula of Michigan. I also quantified composition and the configuration of the landscapes surrounding these stands. I measured relative deer use of T. canadensis stands as pellet group piles deposited in each stand during each of three consecutive winters, 2005-06, 2006-07, and 2007-08. The results of this research suggest that deer use of T. canadensis stands as winter habitat is influenced primarily by snow depth, elevation, and the composition and configuration of the greater landscapes surrounding these stands. Specifically, stands with more heterogeneous landscapes surrounding them (i.e., a patchy mosaic of conifer, deciduous, and open cover) had higher relative deer use than stands surrounded by homogenous deciduous forest cover. Additionally, the intensity of use and the number of stands used was greater in years with higher average snow depth. Tsuga canadensis regeneration in these stands was negatively associated with deer use and Acer saccharum (sugar maple) basal area. Of the 39 stands, 17 and 22 stands had no T. canadensis regeneration in small and large sapling categories, respectively. Acer saccharum was the most common understory tree species, and the importance of A. saccharum in the understory (stems < 10 cm dbh) of the stands was positively associated with overstory A. saccharum dominance. Tsuga canadensis establishment was associated with high-decay coarse woody debris and moss, and deciduous leaf litter inputs in these stands may be limiting access to these important microsites. Furthermore, A. saccharum is more tolerant to the effects of deer herbivory than T. canadensis, giving A. saccharum a competitive advantage in stands being utilized as winter habitat by deer. My research suggests that limited microsite availability, in conjunction with deer herbivory, may be leading to an erosion in T. canadensis patch stability and an altered successional trajectory toward one of A. saccharum dominance, an alternately stable climax species.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
The development of coronary vasculopathy is the main determinant of long-term survival in cardiac transplantation. The identification of risk factors, therefore, seems necessary in order to identify possible treatment strategies. Ninety-five out of 397 patients, undergoing orthotopic cardiac transplantation from 10/1985 to 10/1992 were evaluated retrospectively on the basis of perioperative and postoperative variables including age, sex, diagnosis, previous operations, renal function, cholesterol levels, dosage of immunosuppressive drugs (cyclosporin A, azathioprine, steroids), incidence of rejection, treatment with calcium channel blockers at 3, 6, 12, and 18 months postoperatively. Coronary vasculopathy was assessed by annual angiography at 1 and 2 years postoperatively. After univariate analysis, data were evaluated by stepwise multiple logistic regression analysis. Coronary vasculopathy was assessed in 15 patients at 1 (16%), and in 23 patients (24%) at 2, years. On multivariate analysis, previous operations and the incidence of rejections were identified as significant risk factors (P < 0.05), whereas the underlying diagnosis had borderline significance (P = 0.058) for the development of graft coronary vasculopathy. In contrast, all other variables were not significant in our subset of patients investigated. We therefore conclude that the development of coronary vasculopathy in cardiac transplant patients mainly depends on the rejection process itself, aside from patient-dependent factors. Therapeutic measures, such as the administration of calcium channel blockers and regulation of lipid disorders, may therefore only reduce the progress of native atherosclerotic disease in the posttransplant setting.
Resumo:
Most studies on selection in plants estimate female fitness components and neglect male mating success, although the latter might also be fundamental to understand adaptive evolution. Information from molecular genetic markers can be used to assess determinants of male mating success through parentage analyses. We estimated paternal selection gradients on floral traits in a large natural population of the herb Mimulus guttatus using a paternity probability model and maximum likelihood methods. This analysis revealed more significant selection gradients than a previous analysis based on regression of estimated male fertilities on floral traits. There were differences between results of univariate and multivariate analyses most likely due to the underlying covariance structure of the traits. Multivariate analysis, which corrects for the covariance structure of the traits, indicated that male mating success declined with distance from and depended on the direction to the mother plants. Moreover, there was directional selection for plants with fewer open flowers which have smaller corollas, a smaller anther-stigma separation, more red dots on the corolla and a larger fluctuating asymmetry therein. For most of these traits, however, there was also stabilizing selection indicating that there are intermediate optima for these traits. The large number of significant selection gradients in this study shows that even in relatively large natural populations where not all males can be sampled, it is possible to detect significant paternal selection gradients, and that such studies can give us valuable information required to better understand adaptive plant evolution.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
Resumo:
Background: Speciation reversal: the erosion of species differentiation via an increase in introgressive hybridization due to the weakening of previously divergent selection regimes, is thought to be an important, yet poorly understood, driver of biodiversity loss. Our study system, the Alpine whitefish (Coregonus spp.) species complex is a classic example of a recent postglacial adaptive radiation: forming an array of endemic lake flocks, with the independent origination of similar ecotypes among flocks. However, many of the lakes of the Alpine radiation have been seriously impacted by anthropogenic nutrient enrichment, resulting in a collapse in neutral genetic and phenotypic differentiation within the most polluted lakes. Here we investigate the effects of eutrophication on the selective forces that have shaped this radiation, using population genomics. We studied eight sympatric species assemblages belonging to five independent parallel adaptive radiations, and one species pair in secondary contact. We used AFLP markers, and applied FST outlier (BAYESCAN, DFDIST) and logistic regression analyses (MATSAM), to identify candidate regions for disruptive selection in the genome and their associations with adaptive traits within each lake flock. The number of outlier and adaptive trait associated loci identified per lake were then regressed against two variables (historical phosphorus concentration and contemporary oxygen concentration) representing the strength of eutrophication. Results: Whilst we identify disruptive selection candidate regions in all lake flocks, we find similar trends, across analysis methods, towards fewer disruptive selection candidate regions and fewer adaptive trait/candidate loci associations in the more polluted lakes. Conclusions: Weakened disruptive selection and a concomitant breakdown in reproductive isolating mechanisms in more polluted lakes has lead to increased gene flow between coexisting Alpine whitefish species. We hypothesize that the resulting higher rates of interspecific recombination reduce either the number or extent of genomic islands of divergence surrounding loci evolving under disruptive natural selection. This produces the negative trend seen in the number of selection candidate loci recovered during genome scans of whitefish species flocks, with increasing levels of anthropogenic eutrophication: as the likelihood decreases that AFLP restriction sites will fall within regions of heightened genomic divergence and therefore be classified as FST outlier loci. This study explores for the first time the potential effects of human-mediated relaxation of disruptive selection on heterogeneous genomic divergence between coexisting species.
Resumo:
A historical prospective study was designed to assess the man weight status of subjects who participated in a behavioral weight reduction program in 1983 and to determine whether there was an association between the dependent variable weight change and any of 31 independent variables after a 2 year follow-up period. Data was obtained by abstracting the subjects records and from a follow-up questionnaire administered 2 years following program participation. Five hundred nine subjects (386 females and 123 males) of 1460 subjects who participated in the program, completed and returned the questionnaire. Results showed that mean weight was significantly different (p < 0.001) between the measurement at baseline and after a 2 year follow-up period. The mean weight loss of the group was 5.8 pounds, 10.7 pounds for males and 4.2 pounds for females after a 2 year follow-up period. A total of 63.9% of the group, 69.9% of males and 61.9% of females were still below their initial weight after the 2 year follow-up period. Sixteen of the 31 variables assessed utilizing bivariate analyses were found to be significantly (p (LESSTHEQ) 0.05) associated with weight change after a 2 year follow-up period. These variables were then entered into a multivariate linear regression model. A total of 37.9% of the variance of the dependent variable, weight change, was accounted for by all 16 variables. Eight of these variables were found to be significantly (p (LESSTHEQ) 0.05) predictive of weight change in the stepwise multivariate process accounting for 37.1% of the variance. These variables included: Two baseline variables (percent over ideal body weight at enrollment and occupation) and six follow-up variables (feeling in control of eating habits, percent of body weight lost during treatment, frequency of weight measurement, physical activity, eating in response to emotions, and number of pounds of weight gain needed to resume a diet). It was concluded that a greater amount of emphasis should be placed on the six follow-up variables by clinicians involved in the treatment of obesity, and by the subjects themselves to enhance their chances of success at long-term weight loss. ^
Resumo:
We propose notions of calibration for probabilistic forecasts of general multivariate quantities. Probabilistic copula calibration is a natural analogue of probabilistic calibration in the univariate setting. It can be assessed empirically by checking for the uniformity of the copula probability integral transform (CopPIT), which is invariant under coordinate permutations and coordinatewise strictly monotone transformations of the predictive distribution and the outcome. The CopPIT histogram can be interpreted as a generalization and variant of the multivariate rank histogram, which has been used to check the calibration of ensemble forecasts. Climatological copula calibration is an analogue of marginal calibration in the univariate setting. Methods and tools are illustrated in a simulation study and applied to compare raw numerical model and statistically postprocessed ensemble forecasts of bivariate wind vectors.
Resumo:
This survey provides a self-contained account of M-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying M-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler's (1987) M-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to M-estimation of multivariate location and scatter via multivariate t-distributions.
Resumo:
This paper presents the electron and photon energy calibration achieved with the ATLAS detector using about 25 fb−1 of LHC proton–proton collision data taken at centre-of-mass energies of √s = 7 and 8 TeV. The reconstruction of electron and photon energies is optimised using multivariate algorithms. The response of the calorimeter layers is equalised in data and simulation, and the longitudinal profile of the electromagnetic showers is exploited to estimate the passive material in front of the calorimeter and reoptimise the detector simulation. After all corrections, the Z resonance is used to set the absolute energy scale. For electrons from Z decays, the achieved calibration is typically accurate to 0.05% in most of the detector acceptance, rising to 0.2% in regions with large amounts of passive material. The remaining inaccuracy is less than 0.2–1% for electrons with a transverse energy of 10 GeV, and is on average 0.3% for photons. The detector resolution is determined with a relative inaccuracy of less than 10% for electrons and photons up to 60 GeV transverse energy, rising to 40% for transverse energies above 500 GeV.
Resumo:
Chrysophyte cysts are recognized as powerful proxies of cold-season temperatures. In this paper we use the relationship between chrysophyte assemblages and the number of days below 4 °C (DB4 °C) in the epilimnion of a lake in northern Poland to develop a transfer function and to reconstruct winter severity in Poland for the last millennium. DB4 °C is a climate variable related to the length of the winter. Multivariate ordination techniques were used to study the distribution of chrysophytes from sediment traps of 37 low-land lakes distributed along a variety of environmental and climatic gradients in northern Poland. Of all the environmental variables measured, stepwise variable selection and individual Redundancy analyses (RDA) identified DB4 °C as the most important variable for chrysophytes, explaining a portion of variance independent of variables related to water chemistry (conductivity, chlorides, K, sulfates), which were also important. A quantitative transfer function was created to estimate DB4 °C from sedimentary assemblages using partial least square regression (PLS). The two-component model (PLS-2) had a coefficient of determination of View the MathML sourceRcross2 = 0.58, with root mean squared error of prediction (RMSEP, based on leave-one-out) of 3.41 days. The resulting transfer function was applied to an annually-varved sediment core from Lake Żabińskie, providing a new sub-decadal quantitative reconstruction of DB4 °C with high chronological accuracy for the period AD 1000–2010. During Medieval Times (AD 1180–1440) winters were generally shorter (warmer) except for a decade with very long and severe winters around AD 1260–1270 (following the AD 1258 volcanic eruption). The 16th and 17th centuries and the beginning of the 19th century experienced very long severe winters. Comparison with other European cold-season reconstructions and atmospheric indices for this region indicates that large parts of the winter variability (reconstructed DB4 °C) is due to the interplay between the oscillations of the zonal flow controlled by the North Atlantic Oscillation (NAO) and the influence of continental anticyclonic systems (Siberian High, East Atlantic/Western Russia pattern). Differences with other European records are attributed to geographic climatological differences between Poland and Western Europe (Low Countries, Alps). Striking correspondence between the combined volcanic and solar forcing and the DB4 °C reconstruction prior to the 20th century suggests that winter climate in Poland responds mostly to natural forced variability (volcanic and solar) and the influence of unforced variability is low.
Resumo:
INTRODUCTION Patients admitted to intensive care following surgery for faecal peritonitis present particular challenges in terms of clinical management and risk assessment. Collaborating surgical and intensive care teams need shared perspectives on prognosis. We aimed to determine the relationship between dynamic assessment of trends in selected variables and outcomes. METHODS We analysed trends in physiological and laboratory variables during the first week of intensive care unit (ICU) stay in 977 patients at 102 centres across 16 European countries. The primary outcome was 6-month mortality. Secondary endpoints were ICU, hospital and 28-day mortality. For each trend, Cox proportional hazards (PH) regression analyses, adjusted for age and sex, were performed for each endpoint. RESULTS Trends over the first 7 days of the ICU stay independently associated with 6-month mortality were worsening thrombocytopaenia (mortality: hazard ratio (HR) = 1.02; 95% confidence interval (CI), 1.01 to 1.03; P <0.001) and renal function (total daily urine output: HR =1.02; 95% CI, 1.01 to 1.03; P <0.001; Sequential Organ Failure Assessment (SOFA) renal subscore: HR = 0.87; 95% CI, 0.75 to 0.99; P = 0.047), maximum bilirubin level (HR = 0.99; 95% CI, 0.99 to 0.99; P = 0.02) and Glasgow Coma Scale (GCS) SOFA subscore (HR = 0.81; 95% CI, 0.68 to 0.98; P = 0.028). Changes in renal function (total daily urine output and renal component of the SOFA score), GCS component of the SOFA score, total SOFA score and worsening thrombocytopaenia were also independently associated with secondary outcomes (ICU, hospital and 28-day mortality). We detected the same pattern when we analysed trends on days 2, 3 and 5. Dynamic trends in all other measured laboratory and physiological variables, and in radiological findings, changes inrespiratory support, renal replacement therapy and inotrope and/or vasopressor requirements failed to be retained as independently associated with outcome in multivariate analysis. CONCLUSIONS Only deterioration in renal function, thrombocytopaenia and SOFA score over the first 2, 3, 5 and 7 days of the ICU stay were consistently associated with mortality at all endpoints. These findings may help to inform clinical decision making in patients with this common cause of critical illness.
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
A multivariate frailty hazard model is developed for joint-modeling of three correlated time-to-event outcomes: (1) local recurrence, (2) distant recurrence, and (3) overall survival. The term frailty is introduced to model population heterogeneity. The dependence is modeled by conditioning on a shared frailty that is included in the three hazard functions. Independent variables can be included in the model as covariates. The Markov chain Monte Carlo methods are used to estimate the posterior distributions of model parameters. The algorithm used in present application is the hybrid Metropolis-Hastings algorithm, which simultaneously updates all parameters with evaluations of gradient of log posterior density. The performance of this approach is examined based on simulation studies using Exponential and Weibull distributions. We apply the proposed methods to a study of patients with soft tissue sarcoma, which motivated this research. Our results indicate that patients with chemotherapy had better overall survival with hazard ratio of 0.242 (95% CI: 0.094 - 0.564) and lower risk of distant recurrence with hazard ratio of 0.636 (95% CI: 0.487 - 0.860), but not significantly better in local recurrence with hazard ratio of 0.799 (95% CI: 0.575 - 1.054). The advantages and limitations of the proposed models, and future research directions are discussed. ^