855 resultados para REGRESSION TREE
Resumo:
This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.
Resumo:
The focus of the paper is the nonparametric estimation of an instrumental regression function P defined by conditional moment restrictions stemming from a structural econometric model : E[Y-P(Z)|W]=0 and involving endogenous variables Y and Z and instruments W. The function P is the solution of an ill-posed inverse problem and we propose an estimation procedure based on Tikhonov regularization. The paper analyses identification and overidentification of this model and presents asymptotic properties of the estimated nonparametric instrumental regression function.
Resumo:
This Paper Studies Tests of Joint Hypotheses in Time Series Regression with a Unit Root in Which Weakly Dependent and Heterogeneously Distributed Innovations Are Allowed. We Consider Two Types of Regression: One with a Constant and Lagged Dependent Variable, and the Other with a Trend Added. the Statistics Studied Are the Regression \"F-Test\" Originally Analysed by Dickey and Fuller (1981) in a Less General Framework. the Limiting Distributions Are Found Using Functinal Central Limit Theory. New Test Statistics Are Proposed Which Require Only Already Tabulated Critical Values But Which Are Valid in a Quite General Framework (Including Finite Order Arma Models Generated by Gaussian Errors). This Study Extends the Results on Single Coefficients Derived in Phillips (1986A) and Phillips and Perron (1986).
Resumo:
Thèse réalisée en cotutelle entre l'Université de Montréal et l'Université de Technologie de Troyes
Resumo:
The main objective of this letter is to formulate a new approach of learning a Mahalanobis distance metric for nearest neighbor regression from a training sample set. We propose a modified version of the large margin nearest neighbor metric learning method to deal with regression problems. As an application, the prediction of post-operative trunk 3-D shapes in scoliosis surgery using nearest neighbor regression is described. Accuracy of the proposed method is quantitatively evaluated through experiments on real medical data.
Resumo:
The hazards associated with major accident hazard (MAH) industries are fire, explosion and toxic gas releases. Of these, toxic gas release is the worst as it has the potential to cause extensive fatalities. Qualitative and quantitative hazard analyses are essential for the identitication and quantification of the hazards associated with chemical industries. This research work presents the results of a consequence analysis carried out to assess the damage potential of the hazardous material storages in an industrial area of central Kerala, India. A survey carried out in the major accident hazard (MAH) units in the industrial belt revealed that the major hazardous chemicals stored by the various industrial units are ammonia, chlorine, benzene, naphtha, cyclohexane, cyclohexanone and LPG. The damage potential of the above chemicals is assessed using consequence modelling. Modelling of pool fires for naphtha, cyclohexane, cyclohexanone, benzene and ammonia are carried out using TNO model. Vapor cloud explosion (VCE) modelling of LPG, cyclohexane and benzene are carried out using TNT equivalent model. Boiling liquid expanding vapor explosion (BLEVE) modelling of LPG is also carried out. Dispersion modelling of toxic chemicals like chlorine, ammonia and benzene is carried out using the ALOHA air quality model. Threat zones for different hazardous storages are estimated based on the consequence modelling. The distance covered by the threat zone was found to be maximum for chlorine release from a chlor-alkali industry located in the area. The results of consequence modelling are useful for the estimation of individual risk and societal risk in the above industrial area.Vulnerability assessment is carried out using probit functions for toxic, thermal and pressure loads. Individual and societal risks are also estimated at different locations. Mapping of threat zones due to different incident outcome cases from different MAH industries is done with the help of Are GIS.Fault Tree Analysis (FTA) is an established technique for hazard evaluation. This technique has the advantage of being both qualitative and quantitative, if the probabilities and frequencies of the basic events are known. However it is often difficult to estimate precisely the failure probability of the components due to insufficient data or vague characteristics of the basic event. It has been reported that availability of the failure probability data pertaining to local conditions is surprisingly limited in India. This thesis outlines the generation of failure probability values of the basic events that lead to the release of chlorine from the storage and filling facility of a major chlor-alkali industry located in the area using expert elicitation and proven fuzzy logic. Sensitivity analysis has been done to evaluate the percentage contribution of each basic event that could lead to chlorine release. Two dimensional fuzzy fault tree analysis (TDFFTA) has been proposed for balancing the hesitation factor invo1ved in expert elicitation .
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
In forestry, availability of healthy seeds is an important factor in raising planting stock. Initial seed health and storage conditions are the major factors governing the germinability of seeds. Like seeds of agricultural and horticultural crops, forest tree seeds are also liable to be affected by micro-organisms during storage, which affects the germination, and reduces the viability. Further introduction of seed-borne diseases into newly sown crops/areas on account of using unhealthy seeds is also not ruled out. Availability of healthy stock of seedlings is intrinsic for raising plantations and to meet this requirement elimination of nursery diseases by appropriate chemicals is of prime imortance. As exotic tree species may become susceptible to various native pathogens, it is generally considered better to select indigenous tree species for large scale plantations as they are well adapted to local environment. However, before taking up large scale afforestation progranme involving any indigenous tree species, it is essential to have knowledge about seed disorders and seedling diseases and their management. with a View to select appropriate tree species with fewer seed disorders and seedling disease problems for use in further plantation programme, four indigenous tree species such as Albizia odoratissima (L.f) Benth., Lagerstroemia microcazpa Wt., Pterocazpus marsupiwn Roxb. and Xylia xylocarpa (Roxb.) Taub. were evaluated to meet the above parameters
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining