924 resultados para Limited dependent variable regression
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
Background: MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample.Results: Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace.Conclusion: Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed.
Resumo:
Most structure-building organisms in rocky benthic communities are surface-dependent because their energy inputs depend mainly on the surface they expose to water. Two photosynthetic strategies, divided into calcareous and non calcareous algae, strict suspension-feeders and photosynthetic suspension feeders (e.g. hermatypic corals) are the four main strategies evolutively acquired by benthic organisms. Competition between those strategies occur in relation to productivity of the different species, in such a way that, for given environmental conditions, species with a higher growth (P/B ratio) would dominate. At a worldwide scale, littoral marine benthos can he considered to fit into the four fields defined by two main axes: the first, relates to productivity and relies atrophic and oligotrophic waters and the second is defined by the degree of environmental variability or seasonality (from high to low). Coral reefs (marine ecosystems dominated by photosynthetic suspension feeders) develop in the space of oligotrophic areas with low variability, while kelp beds (marine ecosystem dominated by large, non calcareous algae) are to be found only in eutrophic places with a high variability. The space of eutrophic waters with a low variability do not has specially adapted, high structured, benthic marine ecosystems, and in these conditions opportunistic algae and animals predominate. Finally, photophilic mediterranean benthos -devoid of kelps and without hermatypic corals- typifies the field of oligotrophic areas with high variability; in its more genuine aspect, Mediterranean benthos is represented by small algae with a high percentage of calcareous thallii. In all cases strict suspension-feeders compete successfully with photosynthetic organisms only in situations of low irradiances or very high inputs of POM. In its turn, Mediterranean rocky benthos, in spite of its relative uniformity, is geographically organized along the same axes. The Gulf of Lions and the insular bottoms (Balearic Islands, for example) would correspond to the extremes of eutrophic-high variability areas and oligotrophic-low variability areas, respectively. Irradiance, nutrient and POM concentration, and hydrodynamism are the three variables which mainly affect the distribution of the different surface-dependent strategies, and thus, these parameters are of paramount interest for understanding the trophic structure of Mediterranean benthic communities. In environments non limited by light, nutrient availability, defined as the product between nutrient -POM concentration and hydrodynamism, states the dominance of calcareous versus non calcareous algae. Calcareous algae dominate in oligotrophic waters while non-calcareous algae dominate in moderately eutrophic waters. In light-limited environments, passive suspension feeders (octocorallaria, gorgonians) become dominant species if POM availability is enhanced by a high hydrodynamism (strong currents); in waters with a low charge of POM organisms of other groups, mainly active suspension feeders, predominate (sponges, bryozoans, scleractiniarians). In any case, there always exists a very variable bathymetric zone, depending on light attenuation and nutrient-POM availability, where encrusting calcareous algae strongly compete with suspension feeders (coralligenous).
Resumo:
Cells couple their growth and division rate in response to nutrient availability to maintain a constant size. This co-ordination happens either at the G1-S or the G2-M transition of the cell cycle. In the rod-shaped fission yeast, size regulation happens at the G2-M transition prior to mitotic commitment. Recent studies have focused on the role of the DYRK-family protein kinase Pom1, which forms gradients emanating from cell poles and inhibits the mitotic activator kinase Cdr2, present at the cell middle. Pom1 was proposed to inhibit Cdr2 until cells reached a critical size before division. However when and where Pom1 inhibits Cdr2 is not clear as medial Pom1 levels do not change during cell elongation. Here I show that Pom1 gradients are susceptible to environmental changes in glucose. Specifically, upon glucose limitation, Pom1 re-localizes from the poles to the cell sides where it delays mitosis through regulating Cdr2. This re-localization occurs due to microtubule de- stabilization and lateral catastrophes leading to transient deposition of the Pom1 gradient nucleator Tea4 along the cell cortex. As Tea4 localization to cell sides is sufficient to recruit Pom1, this explains the mechanism of Pom1 re-localization. Microtubule destabilization and consequently Tea4 and Pom1 spread depends on the activity of the cAMP-dependent Protein Kinase A (PKA/Pka1), as pka1 mutant cells have stable microtubules and retain polar Tea4 and Pom1 under limited glucose. PKA signaling negatively regulates the microtubule rescue factor CLASP/Cls1, thus reducing its ability to stabilize microtubules. Thus PKA signaling tunes CLASP activity to promote microtubule de-stabilization and Pom1 re-localization upon glucose limitation. I show that the side-localized Pom1 delays mitosis and balances the role of the mitosis promoting, mitogen-associated protein kinase (MAPK) protein Sty1. Thus Pom1 re-localization may serve to buffer cell size upon glucose limitation. -- Afin de maintenir une taille constante, les cellules régulent leur croissance ainsi que leur taux de division selon les nutriments disponibles dans le milieu. Dans la levure fissipare, cette régulation de la taille précède l'engagement mitotique et se fait à la transition entre les phases G2 à M du cycle cellulaire. Des études récentes se sont focalisées sur le rôle de la protéine Pom1, membre de la famille des DYRK kinase. Celle-ci forme un gradient provenant des pôles de la cellule et inhibe l'activateur mitotique Cdr2 présent au centre de la cellule. Le model propose que Pom1 inhibe Cdr2 jusqu'à atteindre une taille critique avant la division. Cependant quand et à quel endroit dans la cellulle Pom1 inhibe Cdr2 n'était pas clair car les niveaux médians de Pom1 ne changent pas au cours de la l'élongation des cellules. Dans cette étude, je montre que les gradients de Pom1 sont sensibles aux changements environnementaux du taux de glucose. Plus spécifiquement, en conditions limitantes de glucose, Pom1 se relocalise des pôles de la cellule pour se distribuer sur les côtés de celle-ci. Par conséquent, un délai d'entrée en mitose est observé dû à l'inhibition Cdr2 par Pom1. Cette délocalisation est due à la déstabilisation des microtubules qui va conduire à une déposition transitoire de Tea4, le nucléateur du gradient de Pom1, tout au long du cortex de la cellule. Comme la localisation de Tea4 sur les côtés de la cellule est suffisante pour recruter la protéine Pom1, ceci explique le mécanisme de relocalisation de celle-ci. La déstabilisation des microtubules et par conséquent la diffusion de Tea4 et Pom1 dépendent de l'activité de la protéine kinase A dépendante de l'AMP cyclique (PKA/Pka1). En absence de pka1, la stabilité des microtubules n'est pas affectée ce qui permet la rétention de Tea4 et Pom1 aux pôles de la cellule même en conditions limitantes de glucose. La signalisation via PKA régule négativement le facteur de sauvetage des microtubules CLASP/Cls1 et permet donc de réduire sa fonction de déstabilisation des microtubules. Ainsi la signalisation via PKA affine l'activité des CLASP pour promouvoir la déstabilisation des microtubules et la relocalisation de Pom1 en conditions limitantes de glucose. Je montre que la localisation sur les côtés retarde l'entrée en mitose et compense l'action de la protéine Sty1, connue pour être une MAPK qui induit l'entrée en mitose. Ainsi, la relocalisation de Pom1 pourrait servir à tamponner la taille de la cellule en condition limitantes de glucose. -- Various cell types in the environment such as bacterial, plant or animal cells have a distinct cellular size. Maintaining a constant cell size is important for fitness in unicellular organisms and for diverse functions in multicellular organisms. Cells regulate their size by coordinating their growth rate to their division rate. This coupling is important otherwise cells would get progressively smaller or larger after each successive cell cycle. In their natural environment cells may face fluctuations in the available nutrient supply. Thus cells have to coordinate their division rate to the variable growth rates shown under different nutrient conditions. During my PhD, I worked with a single-celled rod shaped yeast called the fission yeast. These cells are longer when the nutrient supply is abundant and shorter when the nutrient supply is scarce. A protein that senses changes in the external carbon source (glucose) is called Protein Kinase A (PKA). The rod shape of fission yeast cells is maintained thanks to a structural backbone called the cytoskeleton. One of the components of this backbone is called microtubules, which are small tube like structures spanning the length of the cell. They transport a protein called Tea4, which in turn is important for the proper localization of another protein Pom1 to the cell ends. Pom1 helps to maintain proper shape and size of these rod shaped yeast cells. My thesis work showed that upon reduction in the external nutrient (glucose) levels, microtubules become less stable and show an alteration in their organization. A significant percentage of the microtubules contact the side of the cell instead of touching only the cell tip. This leads to the spreading of the protein Pom1 away from the tips all around the cell periphery. This helps fission yeast cells to maintain the proper size required under these conditions of limited glucose supply. I further showed that the protein PKA regulates microtubule stability and organization and thus Pom1 spreading and maintenance of proper cell size. Thus my work led to the discovery of a novel pathway by which fission yeast cells maintain their size under limited supply of glucose. -- Divers types cellulaires dans l'environnement tels que les bactéries, les plantes ou les cellules animales ont une taille précise. Le maintien d'une taille cellulaire constante est importante pour le fitness des organismes unicellulaire ainsi que pour multiples fonctions dans les organismes multicellulaires. Les cellules régulent leur taille en coordonnant le taux de croissance avec le taux de division. Ce couplage est essentiel sinon les cellules deviendraient progressivement plus petites ou plus grandes après chaque cycle cellulaire. Dans leur habitat naturels les cellules peuvent faire face a des fluctuations dans le taux de nutriment disponible. Les cellules doivent donc coordonner leur taux de division aux taux variables de croissances perçus dans les différentes conditions nutritionnels. Pendant ma thèse, j'ai travaillée sur une levure unicellulaire, en forme de bâtonnet, nommé levure fissipare ou levure de fission. La taille de ces cellules est plus grande quand le taux de nutriments est grand et plus courte quand celui-ci est plus faible. Une protéine qui perçoit les changements dans le taux externe de la source de carbone (glucose) est nommée PKA pour protéine kinase A. La forme en bâtonnet de la cellule est due aux caractères structuraux du cytosquelette. Une composante importante de ce cytosquelette sont les microtubules, dont la structures ressemble à des petit tubes qui vont d'un bout à l'autre de la cellule. Ces microtubules transportent une protéine importante nommée Tea4 qui à leur tour importante pour la bonne localisation d'une autre protéine Pom1 aux extrémités de la cellule. La protéine Pom1 aide à maintenir la taille appropriée des levures fissipares. Mon travail de thèse a montré qu'en présence de taux faible de nutriments (glucose) les microtubules deviennent de moins en moins stables et montrent une désorganisation globale. Un pourcentage significatif des microtubules touche les côtés de la cellule aux lieu d'atteindre uniquement les extrémités. Ceci a pour conséquence une diffusion de Pom1 tout au long du cortex de la cellule. Ceci aide les levures fissipares à maintenir la taille appropriée pendant ce stress nutritionnel. De plus, je montre que PKA régule la stabilité et l'organisation des microtubules et par conséquent la diffusion de Pom1 et le maintien d'une taille constante. En conclusion, mon travail a conduit à la découverte d'un nouveau mécanisme par lequel la levure fissipare maintient sa taille dans des conditions limitantes en glucose.
Resumo:
In this paper it is argued that rotational wind is not the best choice of leading control variable for variational data assimilation, and an alternative is suggested and tested. A rotational wind parameter is used in most global variational assimilation systems as a pragmatic way of approximately representing the balanced component of the assimilation increments. In effect, rotational wind is treated as a proxy for potential vorticity, but one that it is potentially not a good choice in flow regimes characterised by small Burger number. This paper reports on an alternative set of control variables which are based around potential vorticity. This gives rise to a new formulation of the background error covariances for the Met Office's variational assimilation system, which leads to flow dependency. It uses similar balance relationships to traditional schemes, but recognises the existence of unbalanced rotational wind which is used with a new anti-balance relationship. The new scheme is described and its performance is evaluated and compared to a traditional scheme using a sample of diagnostics.
Resumo:
An input variable selection procedure is introduced for the identification and construction of multi-input multi-output (MIMO) neurofuzzy operating point dependent models. The algorithm is an extension of a forward modified Gram-Schmidt orthogonal least squares procedure for a linear model structure which is modified to accommodate nonlinear system modeling by incorporating piecewise locally linear model fitting. The proposed input nodes selection procedure effectively tackles the problem of the curse of dimensionality associated with lattice-based modeling algorithms such as radial basis function neurofuzzy networks, enabling the resulting neurofuzzy operating point dependent model to be widely applied in control and estimation. Some numerical examples are given to demonstrate the effectiveness of the proposed construction algorithm.
Resumo:
The prevalence of enterohaemorrhagic Escherichia coli (EHEC) O157 in poultry is considered minimal compared with other species, especially ruminants. However, deliberate inoculation studies have shown that poultry are readily and persistently infected by this organism but that the mechanism of colonisation is independent of intimin, a recognised factor in host-EHEC interactions in mammalian species, and may be dependent upon flagella. Few strains of EHEC O157 have been tested in poultry and here 1-day-old and 6-week-old chicks were inoculated with seven non-toxigenic E. coli O157 strains in separate experiments. Persistence was measured semi-quantitatively by bacteriological assessment of E. coli O157 cultured from cloacal swabs (shedding score). In the 1-day-old chick model that was monitored for 43 days, all seven strains established well after inoculation. In the 6-week-old chicken model, one strain established and gave consistently high shedding for the duration of the experiment (156 days). Whereas of the remaining six strains, two persisted for 113 days, two persisted for 43 days, one persisted for 22 days and one strain was never detected.
Resumo:
The redistribution of a finite amount of martian surface dust during global dust storms and in the intervening periods has been modelled in a dust lifting version of the UK Mars General Circulation Model. When using a constant, uniform threshold in the model’s wind stress lifting parameterisation and assuming an unlimited supply of surface dust, multiannual simulations displayed some variability in dust lifting activity from year to year, arising from internal variability manifested in surface wind stress, but dust storms were limited in size and formed within a relatively short seasonal window. Lifting thresholds were then allowed to vary at each model gridpoint, dependent on the rates of emission or deposition of dust. This enhanced interannual variability in dust storm magnitude and timing, such that model storms covered most of the observed ranges in size and initiation date within a single multiannual simulation. Peak storm magnitude in a given year was primarily determined by the availability of surface dust at a number of key sites in the southern hemisphere. The observed global dust storm (GDS) frequency of roughly one in every 3 years was approximately reproduced, but the model failed to generate these GDSs spontaneously in the southern hemisphere, where they have typically been observed to initiate. After several years of simulation, the surface threshold field—a proxy for net change in surface dust density—showed good qualitative agreement with the observed pattern of martian surface dust cover. The model produced a net northward cross-equatorial dust mass flux, which necessitated the addition of an artificial threshold decrease rate in order to allow the continued generation of dust storms over the course of a multiannual simulation. At standard model resolution, for the southward mass flux due to cross-equatorial flushing storms to offset the northward flux due to GDSs on a timescale of ∼3 years would require an increase in the former by a factor of 3–4. Results at higher model resolution and uncertainties in dust vertical profiles mean that quasi-periodic redistribution of dust on such a timescale nevertheless appears to be a plausible explanation for the observed GDS frequency.
Resumo:
Accurate estimates of how soil water stress affects plant transpiration are crucial for reliable land surface model (LSM) predictions. Current LSMs generally use a water stress factor, β, dependent on soil moisture content, θ, that ranges linearly between β = 1 for unstressed vegetation and β = 0 when wilting point is reached. This paper explores the feasibility of replacing the current approach with equations that use soil water potential as their independent variable, or with a set of equations that involve hydraulic and chemical signaling, thereby ensuring feedbacks between the entire soil–root–xylem–leaf system. A comparison with the original linear θ-based water stress parameterization, and with its improved curvi-linear version, was conducted. Assessment of model suitability was focused on their ability to simulate the correct (as derived from experimental data) curve shape of relative transpiration versus fraction of transpirable soil water. We used model sensitivity analyses under progressive soil drying conditions, employing two commonly used approaches to calculate water retention and hydraulic conductivity curves. Furthermore, for each of these hydraulic parameterizations we used two different parameter sets, for 3 soil texture types; a total of 12 soil hydraulic permutations. Results showed that the resulting transpiration reduction functions (TRFs) varied considerably among the models. The fact that soil hydraulic conductivity played a major role in the model that involved hydraulic and chemical signaling led to unrealistic values of β, and hence TRF, for many soil hydraulic parameter sets. However, this model is much better equipped to simulate the behavior of different plant species. Based on these findings, we only recommend implementation of this approach into LSMs if great care with choice of soil hydraulic parameters is taken
Resumo:
Background: Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods: Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results: We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions: The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.
Resumo:
Cyclin-dependent kinases (CDKs) successively phosphorylate the retinoblastoma protein (RB) at the restriction point in G1 phase. Hyperphosphorylation results in functional inactivation of RB, activation of the E2F transcriptional program, and entry of cells into S phase. RB unphosphorylated at serine 608 has growth suppressive activity. Phosphorylation of serines 608/612 inhibits binding of E2F-1 to RB. In Nalm-6 acute lymphoblastic leukemia extracts, serine 608 is phosphorylated by CDK4/6 complexes but not by CDK2. We reasoned that phosphorylation of serines 608/612 by redundant CDKs could accelerate phospho group formation and determined which G1 CDK contributes to serine 612 phosphorylation. Here, we report that CDK4 complexes from Nalm-6 extracts phosphorylated in vitro the CDK2-preferred serine 612, which was inhibited by p16INK4a, and fascaplysin. In contrast, serine 780 and serine 795 were efficiently phosphorylated by CDK4 but not by CDK2. The data suggest that the redundancy in phosphorylation of RB by CDK2 and CDK4 in Nalm-6 extracts is limited. Serine 612 phosphorylation by CDK4 also occurred in extracts of childhood acute lymphoblastic leukemia cells but not in extracts of mobilized CD34+ hemopoietic progenitor cells. This phenomenon could contribute to the commitment of childhood acute lymphocytic leukemia cells to proliferate and explain their refractoriness to differentiation-inducing agents.
Resumo:
Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.
Resumo:
BACKGROUND : Comparisons between younger and older stroke patients including comorbidities are limited. METHODS : Prospective data of consecutive patients with first ever acute ischemic stroke were compared between younger (= 45 years) and older patients (> 45 years). RESULTS : Among 1004 patients, 137 (14 %) were = 45 years. Younger patients were more commonly female (57 % versus 34 %; p < 0.0001), had a lower frequency of diabetes (1 % versus 15 %; p < 0.0001), hypercholesterolemia (26 % versus 56 %; p < 0.0001), hypertension (19 % versus 65 %; p < 0.0001), coronary heart disease (14 % versus 40 %; p < 0.0001), and a lower mean Charlson co-morbidity index (CCI), (0.18 versus 0.84; p < 0.0001). Tobacco use was more prevalent in the young (39 % versus 26 %; P < 0.0001). Large artery disease (2 % versus 21 %; p < 0.0001), small artery disease (3 % versus 12 %; p = 0.0019) and atrial fibrillation (1 % versus 17 %; p = 0.001) were less common in young patients, while other etiologies (31 % versus 9 %; p < 0.0001), patent foramen ovale or atrial septal defect (44 % versus 26 %; p < 0.0001), and cervical artery dissection (26 % versus 7 %; p < 0.0001) were more frequent. A favorable outcome (mRS 0 or 1) was more common (57.4 % versus 46.9 %; p = 0.023), and mortality (5.1 % versus 12 %; p = 0.009) was lower in the young. After regression analysis, there was no independent association between age and outcome (p = 0.206) or mortality (p = 0.073). Baseline NIHSS score (p < 0.0001), diabetes (p = 0.041), and CCI (p = 0.002) independently predicted an unfavorable outcome. CONCLUSIONS : Younger patients were more likely to be female, had different risk factors and etiologies and fewer co-morbidities. There was no independent association between age and clinical outcome or mortality.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.