6 resultados para Boosted regression trees
em Dalarna University College Electronic Archive
Resumo:
In this paper, we study the influence of the National Telecom Business Volume by the data in 2008 that have been published in China Statistical Yearbook of Statistics. We illustrate the procedure of modeling “National Telecom Business Volume” on the following eight variables, GDP, Consumption Levels, Retail Sales of Social Consumer Goods Total Renovation Investment, the Local Telephone Exchange Capacity, Mobile Telephone Exchange Capacity, Mobile Phone End Users, and the Local Telephone End Users. The testing of heteroscedasticity and multicollinearity for model evaluation is included. We also consider AIC and BIC criterion to select independent variables, and conclude the result of the factors which are the optimal regression model for the amount of telecommunications business and the relation between independent variables and dependent variable. Based on the final results, we propose several recommendations about how to improve telecommunication services and promote the economic development.
Resumo:
Variation in wood properties for Picea abies trees and logs of different dimensions has been studied at two sites in southern Sweden of different site quality class. Trees have been classified as dominant or sub-dominant, according to their height. Log and board grades were classified and strength grade of boards, basic density and annual ring width measured. A similar study made on four northern sites was used as reference material.Sub-dominant trees were of superior quality in comparison to dominant trees, when classified by log and board grades or strength grading. Differences were accentuated for the second log where the sub-dominant trees had superior strength and low amount of boards with coarse branches. The results correspond well to those from the northern region, Jämtland. The classifica¬tion of boards as well as bending strength indicated superior properties on timber from northern sites even though the basic density was similar.
Resumo:
This is a note about proxy variables and instruments for identification of structural parameters in regression models. We have experienced that in the econometric textbooks these two issues are treated separately, although in practice these two concepts are very often combined. Usually, proxy variables are inserted in instrument variable regressions with the motivation they are exogenous. Implicitly meaning they are exogenous in a reduced form model and not in a structural model. Actually if these variables are exogenous they should be redundant in the structural model, e.g. IQ as a proxy for ability. Valid proxies reduce unexplained variation and increases the efficiency of the estimator of the structural parameter of interest. This is especially important in situations when the instrument is weak. With a simple example we demonstrate what is required of a proxy and an instrument when they are combined. It turns out that when a researcher has a valid instrument the requirements on the proxy variable is weaker than if no such instrument exists
Resumo:
FP7- MacSheep
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.