914 resultados para Random regression


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper I analyze the general equilibrium in a random Walrasian economy. Dependence among agents is introduced in the form of dependency neighborhoods. Under the uncertainty, an agent may fail to survive due to a meager endowment in a particular state (direct effect), as well as due to unfavorable equilibrium price system at which the value of the endowment falls short of the minimum needed for survival (indirect terms-of-trade effect). To illustrate the main result I compute the stochastic limit of equilibrium price and probability of survival of an agent in a large Cobb-Douglas economy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data augmentation is a powerful technique for estimating models with latent or missing data, but applications in agricultural economics have thus far been few. This paper showcases the technique in an application to data on milk market participation in the Ethiopian highlands. There, a key impediment to economic development is an apparently low rate of market participation. Consequently, economic interest centers on the “locations” of nonparticipants in relation to the market and their “reservation values” across covariates. These quantities are of policy interest because they provide measures of the additional inputs necessary in order for nonparticipants to enter the market. One quantity of primary interest is the minimum amount of surplus milk (the “minimum efficient scale of operations”) that the household must acquire before market participation becomes feasible. We estimate this quantity through routine application of data augmentation and Gibbs sampling applied to a random-censored Tobit regression. Incorporating random censoring affects markedly the marketable-surplus requirements of the household, but only slightly the covariates requirements estimates and, generally, leads to more plausible policy estimates than the estimates obtained from the zero-censored formulation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a model of market participation in which the presence of non-negligible fixed costs leads to random censoring of the traditional double-hurdle model. Fixed costs arise when household resources must be devoted a priori to the decision to participate in the market. These costs, usually of time, are manifested in non-negligible minimum-efficient supplies and supply correspondence that requires modification of the traditional Tobit regression. The costs also complicate econometric estimation of household behavior. These complications are overcome by application of the Gibbs sampler. The algorithm thus derived provides robust estimates of the fixed-costs, double-hurdle model. The model and procedures are demonstrated in an application to milk market participation in the Ethiopian highlands.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to validate the reported precision of space‐based atmospheric composition measurements, validation studies often focus on measurements in the tropical stratosphere, where natural variability is weak. The scatter in tropical measurements can then be used as an upper limit on single‐profile measurement precision. Here we introduce a method of quantifying the scatter of tropical measurements which aims to minimize the effects of short‐term atmospheric variability while maintaining large enough sample sizes that the results can be taken as representative of the full data set. We apply this technique to measurements of O3, HNO3, CO, H2O, NO, NO2, N2O, CH4, CCl2F2, and CCl3F produced by the Atmospheric Chemistry Experiment–Fourier Transform Spectrometer (ACE‐FTS). Tropical scatter in the ACE‐FTS retrievals is found to be consistent with the reported random errors (RREs) for H2O and CO at altitudes above 20 km, validating the RREs for these measurements. Tropical scatter in measurements of NO, NO2, CCl2F2, and CCl3F is roughly consistent with the RREs as long as the effect of outliers in the data set is reduced through the use of robust statistics. The scatter in measurements of O3, HNO3, CH4, and N2O in the stratosphere, while larger than the RREs, is shown to be consistent with the variability simulated in the Canadian Middle Atmosphere Model. This result implies that, for these species, stratospheric measurement scatter is dominated by natural variability, not random error, which provides added confidence in the scientific value of single‐profile measurements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

(ABR) is of fundamental importance to the investiga- tion of the auditory system behavior, though its in- terpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identi- fication of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave la- tency) is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a me- thod for evaluating the relation between the responses given by the examiners. Results: The analysis sug- gests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the ob- tained wave latency differences and 18% of the inves- tigated waves presented substantial differences (large and moderate) and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To determine the prevalence and nature of prescribing and monitoring errors in general practices in England. Design Retrospective case note review of unique medication items prescribed over a 12 month period to a 2% random sample of patients. Mixed effects logistic regression was used to analyse the data. Setting Fifteen general practices across three primary care trusts in England. Data sources Examination of 6048 unique prescription items prescribed over the previous 12 months for 1777 patients. Main outcome measures Prevalence of prescribing and monitoring errors, and severity of errors, using validated definitions. Results Prescribing and/or monitoring errors were detected in 4.9% (296/6048) of all prescription items (95% confidence interval 4.4 - 5.5%). The vast majority of errors were of mild to moderate severity, with 0.2% (11/6048) of items having a severe error. After adjusting for covariates, patient-related factors associated with an increased risk of prescribing and/or monitoring errors were: age less than 15 (Odds Ratio (OR) 1.87, 1.19 to 2.94, p=0.006) or greater than 64 years (OR 1.68, 1.04 to 2.73, p=0.035), and higher numbers of unique medication items prescribed (OR 1.16, 1.12 to 1.19, p<0.001). Conclusion Prescribing and monitoring errors are common in English general practice, although severe errors are unusual. Many factors increase the risk of error. Having identified the most common and important errors, and the factors associated with these, strategies to prevent future errors should be developed based on the study findings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many applications, such as intermittent data assimilation, lead to a recursive application of Bayesian inference within a Monte Carlo context. Popular data assimilation algorithms include sequential Monte Carlo methods and ensemble Kalman filters (EnKFs). These methods differ in the way Bayesian inference is implemented. Sequential Monte Carlo methods rely on importance sampling combined with a resampling step, while EnKFs utilize a linear transformation of Monte Carlo samples based on the classic Kalman filter. While EnKFs have proven to be quite robust even for small ensemble sizes, they are not consistent since their derivation relies on a linear regression ansatz. In this paper, we propose another transform method, which does not rely on any a priori assumptions on the underlying prior and posterior distributions. The new method is based on solving an optimal transportation problem for discrete random variables. © 2013, Society for Industrial and Applied Mathematics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present paper we study the approximation of functions with bounded mixed derivatives by sparse tensor product polynomials in positive order tensor product Sobolev spaces. We introduce a new sparse polynomial approximation operator which exhibits optimal convergence properties in L2 and tensorized View the MathML source simultaneously on a standard k-dimensional cube. In the special case k=2 the suggested approximation operator is also optimal in L2 and tensorized H1 (without essential boundary conditions). This allows to construct an optimal sparse p-version FEM with sparse piecewise continuous polynomial splines, reducing the number of unknowns from O(p2), needed for the full tensor product computation, to View the MathML source, required for the suggested sparse technique, preserving the same optimal convergence rate in terms of p. We apply this result to an elliptic differential equation and an elliptic integral equation with random loading and compute the covariances of the solutions with View the MathML source unknowns. Several numerical examples support the theoretical estimates.