46 resultados para Regression Trees
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Psidium guajava ""Paluma"", a tropical tree species, is known to be an efficient ozone indicator in tropical countries. When exposed to ozone, this species displays a characteristic leaf injury identified by inter-veinal red stippling on adaxial leaf surfaces. Following 30 days of three ozone treatments consisting of carbon filtered air (CF - AOT40 = 17 ppb h), ambient non-filtered air (NF - AOT40 = 542 ppb h) and ambient non-filtered air + 40 ppb ozone (NF + O(3) - AOT40 - 7802 ppb h), the amounts of residual anthocyanins and tannins present in 10 P. guajava (""Paluma"") saplings were quantified. Higher amounts of anthocyanins were found in the NF + O(3) treatment (1.6%) when compared to the CF (0.97%) and NF (1.30%) (p < 0.05), and of total tannins in the NF + O(3) treatment (0.16%) compared to the CIF (0.14%). Condensed tannins showed the same tendency as enhanced amounts. Regression analyses using amounts of tannins and anthocyanins, AOT40 and the leaf injury index (LII), showed a correlation between the leaf injury index and quantities of anthocyanins and total tannins. These results are in accordance with the association between the incidence of red-stippled leaves and ozone polluted environments. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Knowledge on juvenile tree growth is crucial to understand how trees reach the canopy in tropical forests. However, long-term data on juvenile tree growth are usually unavailable. Annual tree rings provide growth information for the entire life of trees and their analysis has become more popular in tropical forest regions over the past decades. Nonetheless, tree ring studies mainly deal with adult rings as the annual character of juvenile rings has been questioned. We evaluated whether juvenile tree rings can be used for three Bolivian rainforest species. First, we characterized the rings of juvenile and adult trees anatomically. We then evaluated the annual nature of tree rings by a combination of three indirect methods: evaluation of synchronous growth patterns in the tree- ring series, (14)C bomb peak dating and correlations with rainfall. Our results indicate that rings of juvenile and adult trees are defined by similar ring-boundary elements. We built juvenile tree-ring chronologies and verified the ring age of several samples using (14)C bomb peak dating. We found that ring width was correlated with rainfall in all species, but in different ways. In all, the chronology, rainfall correlations and (14)C dating suggest that rings in our study species are formed annually.
Resumo:
In this article, we present a generalization of the Bayesian methodology introduced by Cepeda and Gamerman (2001) for modeling variance heterogeneity in normal regression models where we have orthogonality between mean and variance parameters to the general case considering both linear and highly nonlinear regression models. Under the Bayesian paradigm, we use MCMC methods to simulate samples for the joint posterior distribution. We illustrate this algorithm considering a simulated data set and also considering a real data set related to school attendance rate for children in Colombia. Finally, we present some extensions of the proposed MCMC algorithm.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
A bivariate regression model for matched paired survival data: local influence and residual analysis
Resumo:
The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.
Resumo:
In this paper we have discussed inference aspects of the skew-normal nonlinear regression models following both, a classical and Bayesian approach, extending the usual normal nonlinear regression models. The univariate skew-normal distribution that will be used in this work was introduced by Sahu et al. (Can J Stat 29:129-150, 2003), which is attractive because estimation of the skewness parameter does not present the same degree of difficulty as in the case with Azzalini (Scand J Stat 12:171-178, 1985) one and, moreover, it allows easy implementation of the EM-algorithm. As illustration of the proposed methodology, we consider a data set previously analyzed in the literature under normality.
Resumo:
The purpose of this paper is to develop a Bayesian approach for log-Birnbaum-Saunders Student-t regression models under right-censored survival data. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the considered model. In order to attenuate the influence of the outlying observations on the parameter estimates, we present in this paper Birnbaum-Saunders models in which a Student-t distribution is assumed to explain the cumulative damage. Also, some discussions on the model selection to compare the fitted models are given and case deletion influence diagnostics are developed for the joint posterior distribution based on the Kullback-Leibler divergence. The developed procedures are illustrated with a real data set. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this work we introduce a new hierarchical surface decomposition method for multiscale analysis of surface meshes. In contrast to other multiresolution methods, our approach relies on spectral properties of the surface to build a binary hierarchical decomposition. Namely, we utilize the first nontrivial eigenfunction of the Laplace-Beltrami operator to recursively decompose the surface. For this reason we coin our surface decomposition the Fiedler tree. Using the Fiedler tree ensures a number of attractive properties, including: mesh-independent decomposition, well-formed and nearly equi-areal surface patches, and noise robustness. We show how the evenly distributed patches can be exploited for generating multiresolution high quality uniform meshes. Additionally, our decomposition permits a natural means for carrying out wavelet methods, resulting in an intuitive method for producing feature-sensitive meshes at multiple scales. Published by Elsevier Ltd.
Resumo:
In interval-censored survival data, the event of interest is not observed exactly but is only known to occur within some time interval. Such data appear very frequently. In this paper, we are concerned only with parametric forms, and so a location-scale regression model based on the exponentiated Weibull distribution is proposed for modeling interval-censored data. We show that the proposed log-exponentiated Weibull regression model for interval-censored data represents a parametric family of models that include other regression models that are broadly used in lifetime data analysis. Assuming the use of interval-censored data, we employ a frequentist analysis, a jackknife estimator, a parametric bootstrap and a Bayesian analysis for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Furthermore, for different parameter settings, sample sizes and censoring percentages, various simulations are performed; in addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to a modified deviance residual in log-exponentiated Weibull regression models for interval-censored data. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper, the generalized log-gamma regression model is modified to allow the possibility that long-term survivors may be present in the data. This modification leads to a generalized log-gamma regression model with a cure rate, encompassing, as special cases, the log-exponential, log-Weibull and log-normal regression models with a cure rate typically used to model such data. The models attempt to simultaneously estimate the effects of explanatory variables on the timing acceleration/deceleration of a given event and the surviving fraction, that is, the proportion of the population for which the event never occurs. The normal curvatures of local influence are derived under some usual perturbation schemes and two martingale-type residuals are proposed to assess departures from the generalized log-gamma error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed.
Resumo:
In survival analysis applications, the failure rate function may frequently present a unimodal shape. In such case, the log-normal or log-logistic distributions are used. In this paper, we shall be concerned only with parametric forms, so a location-scale regression model based on the Burr XII distribution is proposed for modeling data with a unimodal failure rate function as an alternative to the log-logistic regression model. Assuming censored data, we consider a classic analysis, a Bayesian analysis and a jackknife estimator for the parameters of the proposed model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the log-logistic and log-Burr XII regression models. Besides, we use sensitivity analysis to detect influential or outlying observations, and residual analysis is used to check the assumptions in the model. Finally, we analyze a real data set under log-Buff XII regression models. (C) 2008 Published by Elsevier B.V.
Resumo:
The comprehensive characterization of the structure of complex networks is essential to understand the dynamical processes which guide their evolution. The discovery of the scale-free distribution and the small-world properties of real networks were fundamental to stimulate more realistic models and to understand important dynamical processes related to network growth. However, the properties of the network borders (nodes with degree equal to 1), one of its most fragile parts, remained little investigated and understood. The border nodes may be involved in the evolution of structures such as geographical networks. Here we analyze the border trees of complex networks, which are defined as the subgraphs without cycles connected to the remainder of the network (containing cycles) and terminating into border nodes. In addition to describing an algorithm for identification of such tree subgraphs, we also consider how their topological properties can be quantified in terms of their depth and number of leaves. We investigate the properties of border trees for several theoretical models as well as real-world networks. Among the obtained results, we found that more than half of the nodes of some real-world networks belong to the border trees. A power-law with cut-off was observed for the distribution of the depth and number of leaves of the border trees. An analysis of the local role of the nodes in the border trees was also performed.