82 resultados para Heterogeneous regression
Resumo:
We develop a particle swarm optimisation (PSO) aided orthogonal forward regression (OFR) approach for constructing radial basis function (RBF) classifiers with tunable nodes. At each stage of the OFR construction process, the centre vector and diagonal covariance matrix of one RBF node is determined efficiently by minimising the leave-one-out (LOO) misclassification rate (MR) using a PSO algorithm. Compared with the state-of-the-art regularisation assisted orthogonal least square algorithm based on the LOO MR for selecting fixednode RBF classifiers, the proposed PSO aided OFR algorithm for constructing tunable-node RBF classifiers offers significant advantages in terms of better generalisation performance and smaller model size as well as imposes lower computational complexity in classifier construction process. Moreover, the proposed algorithm does not have any hyperparameter that requires costly tuning based on cross validation.
Resumo:
This paper derives some exact power properties of tests for spatial autocorrelation in the context of a linear regression model. In particular, we characterize the circumstances in which the power vanishes as the autocorrelation increases, thus extending the work of Krämer (2005). More generally, the analysis in the paper sheds new light on how the power of tests for spatial autocorrelation is affected by the matrix of regressors and by the spatial structure. We mainly focus on the problem of residual spatial autocorrelation, in which case it is appropriate to restrict attention to the class of invariant tests, but we also consider the case when the autocorrelation is due to the presence of a spatially lagged dependent variable among the regressors. A numerical study aimed at assessing the practical relevance of the theoretical results is included
Resumo:
A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.
Resumo:
A statistical technique for fault analysis in industrial printing is reported. The method specifically deals with binary data, for which the results of the production process fall into two categories, rejected or accepted. The method is referred to as logistic regression, and is capable of predicting future fault occurrences by the analysis of current measurements from machine parts sensors. Individual analysis of each type of fault can determine which parts of the plant have a significant influence on the occurrence of such faults; it is also possible to infer which measurable process parameters have no significant influence on the generation of these faults. Information derived from the analysis can be helpful in the operator's interpretation of the current state of the plant. Appropriate actions may then be taken to prevent potential faults from occurring. The algorithm is being implemented as part of an applied self-learning expert system.
Resumo:
Scintillometry is an established technique for determining large areal average sensible heat fluxes. The scintillometer measurement is related to sensible heat flux via Monin–Obukhov similarity theory, which was developed for ideal homogeneous land surfaces. In this study it is shown that judicious application of scintillometry over heterogeneous mixed agriculture on undulating topography yields valid results when compared to eddy covariance (EC). A large aperture scintillometer (LAS) over a 2.4 km path was compared with four EC stations measuring sensible (H) and latent (LvE) heat fluxes over different vegetation (cereals and grass) which when aggregated were representative of the LAS source area. The partitioning of available energy into H and LvE varied strongly for different vegetation types, with H varying by a factor of three between senesced winter wheat and grass pasture. The LAS derived H agrees (one-to-one within the experimental uncertainty) with H aggregated from EC with a high coefficient of determination of 0.94. Chronological analysis shows individual fields may have a varying contribution to the areal average sensible heat flux on short (weekly) time scales due to phenological development and changing soil moisture conditions. Using spatially aggregated measurements of net radiation and soil heat flux with H from the LAS, the areal averaged latent heat flux (LvELAS) was calculated as the residual of the surface energy balance. The regression of LvELAS against aggregated LvE from the EC stations has a slope of 0.94, close to ideal, and demonstrates that this is an accurate method for the landscape-scale estimation of evaporation over heterogeneous complex topography.
Resumo:
This paper studies the effects of increasing formality via tax reduction and simplification schemes on micro-firm performance. It uses the 1997 Brazilian SIMPLES program. We develop a simple theoretical model to show that SIMPLES has an impact only on a segment of the micro-firm population, for which the effect of formality on firm performance can be identified, and that can be analyzed along the single dimensional quantiles of the conditional firm revenues. To estimate the effect of formality, we use an econometric approach that compares eligible and non-eligible firms, born before and after SIMPLES in a local interval about the introduction of SIMPLES. We use an estimator that combines both quantile regression and the regression discontinuity identification strategy. The empirical results corroborate the positive effect of formality on microfirms' performance and produce a clear characterization of who benefits from these programs.
Resumo:
In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
Small propagules like pollen or fungal spores may be dispersed by the wind over distances of hundreds or thousands of kilometres,even though the median dispersal may be only a few metres. Such long-distance dispersal is a stochastic event which may be exceptionally important in shaping a population. It has been found repeatedly in field studies that subpopulations of wind-dispersed fungal pathogens virulent on cultivars with newly introduced, effective resistance genes are dominated by one or very few genotypes. The role of propagule dispersal distributions with distinct behaviour at long distances in generating this characteristic population structure was studied by computer simulation of dispersal of clonal organisms in a heterogeneous environment with fields of unselective and selective hosts. Power-law distributions generated founder events in which new, virulent genotypes rapidly colonized fields of resistant crop varieties and subsequently dominated the pathogen population on both selective and unselective varieties, in agreement with data on rust and powdery mildew fungi. An exponential dispersal function, with extremely rare dispersal over long distances, resulted in slower colonization of resistant varieties by virulent pathogens or even no colonization if the distance between susceptible source and resistant target fields was sufficiently large. The founder events resulting from long-distance dispersal were highly stochastic and exact quantitative prediction of genotype frequencies will therefore always be difficult.
Resumo:
We present an approach for dealing with coarse-resolution Earth observations (EO) in terrestrial ecosystem data assimilation schemes. The use of coarse-scale observations in ecological data assimilation schemes is complicated by spatial heterogeneity and nonlinear processes in natural ecosystems. If these complications are not appropriately dealt with, then the data assimilation will produce biased results. The “disaggregation” approach that we describe in this paper combines frequent coarse-resolution observations with temporally sparse fine-resolution measurements. We demonstrate the approach using a demonstration data set based on measurements of an Arctic ecosystem. In this example, normalized difference vegetation index observations are assimilated into a “zero-order” model of leaf area index and carbon uptake. The disaggregation approach conserves key ecosystem characteristics regardless of the observation resolution and estimates the carbon uptake to within 1% of the demonstration data set “truth.” Assimilating the same data in the normal manner, but without the disaggregation approach, results in carbon uptake being underestimated by 58% at an observation resolution of 250 m. The disaggregation method allows the combination of multiresolution EO and improves in spatial resolution if observations are located on a grid that shifts from one observation time to the next. Additionally, the approach is not tied to a particular data assimilation scheme, model, or EO product and can cope with complex observation distributions, as it makes no implicit assumptions of normality.