Biblioteca Digital

This paper presents a feature selection method for data classification, which combines a model-based variable selection technique and a fast two-stage subset selection algorithm. The relationship between a specified (and complete) set of candidate features and the class label is modelled using a non-linear full regression model which is linear-in-the-parameters. The performance of a sub-model measured by the sum of the squared-errors (SSE) is used to score the informativeness of the subset of features involved in the sub-model. The two-stage subset selection algorithm approaches a solution sub-model with the SSE being locally minimized. The features involved in the solution sub-model are selected as inputs to support vector machines (SVMs) for classification. The memory requirement of this algorithm is independent of the number of training patterns. This property makes this method suitable for applications executed in mobile devices where physical RAM memory is very limited. An application was developed for activity recognition, which implements the proposed feature selection algorithm and an SVM training procedure. Experiments are carried out with the application running on a PDA for human activity recognition using accelerometer data. A comparison with an information gain based feature selection method demonstrates the effectiveness and efficiency of the proposed algorithm.

Veja mais

Improved Nonlinear PCA for Process Monitoring Using Support Vector Data Description

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nonlinear principal component analysis (PCA) based on neural networks has drawn significant attention as a monitoring tool for complex nonlinear processes, but there remains a difficulty with determining the optimal network topology. This paper exploits the advantages of the Fast Recursive Algorithm, where the number of nodes, the location of centres, and the weights between the hidden layer and the output layer can be identified simultaneously for the radial basis function (RBF) networks. The topology problem for the nonlinear PCA based on neural networks can thus be solved. Another problem with nonlinear PCA is that the derived nonlinear scores may not be statistically independent or follow a simple parametric distribution. This hinders its applications in process monitoring since the simplicity of applying predetermined probability distribution functions is lost. This paper proposes the use of a support vector data description and shows that transforming the nonlinear principal components into a feature space allows a simple statistical inference. Results from both simulated and industrial data confirm the efficacy of the proposed method for solving nonlinear principal component problems, compared with linear PCA and kernel PCA.

Veja mais

Two-stage gene selection for support vector machine classification of microarray data

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Fuzzy Chance Constrained Support Vector Machine

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Improved training of an optimal sparse least squares support vector machine

Relevância:

20.00% 20.00%

Publicador:

Veja mais

A Fast Training Algorithm for Least-Squares Support Vector Machines

Relevância:

20.00% 20.00%

Publicador:

Veja mais

How Porosity and Permeability Vary Spatially With Grain Size, Sorting, Cement Volume, and Mineral Dissolution In Fluvial Triassic Sandstones: The Value of Geostatistics and Local Regression

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although it is well known that sandstone porosity and permeability are controlled by a range of parameters such as grain size and sorting, amount, type, and location of diagenetic cements, extent and type of compaction, and the generation of intergranular and intragranular secondary porosity, it is less constrained how these controlling parameters link up in rock volumes (within and between beds) and how they spatially interact to determine porosity and permeability. To address these unknowns, this study examined Triassic fluvial sandstone outcrops from the UK using field logging, probe permeametry of 200 points, and sampling at 100 points on a gridded rock surface. These field observations were supplemented by laser particle-size analysis, thin-section point-count analysis of primary and diagenetic mineralogy, quantitiative XRD mineral analysis, and SEM/EDAX analysis of all 100 samples. These data were analyzed using global regression, variography, kriging, conditional simulation, and geographically weighted regression to examine the spatial relationships between porosity and permeability and their potential controls. The results of bivariate analysis (global regression) of the entire outcrop dataset indicate only a weak correlation between both permeability porosity and their diagenetic and depositional controls and provide very limited information on the role of primary textural structures such as grain size and sorting. Subdividing the dataset further by bedding unit revealed details of more local controls on porosity and permeability. An alternative geostatistical approach combined with a local modelling technique (geographically weighted regression; GWR) subsequently was used to examine the spatial variability of porosity and permeability and their controls. The use of GWR does not require prior knowledge of divisions between bedding units, but the results from GWR broadly concur with results of regression analysis by bedding unit and provide much greater clarity of how porosity and permeability and their controls vary laterally and vertically. The close relationship between depositional lithofacies in each bed, diagenesis, and permeability, porosity demonstrates that each influences the other, and in turn how understanding of reservoir properties is enhanced by integration of paleoenvironmental reconstruction, stratigraphy, mineralogy, and geostatistics.

Veja mais

Increasing the accuracy of Nitrogen Dioxide (NO2) pollution mapping using geographically weighted regression (GWR) and geostatistics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nitrogen Dioxide (NO2) is known to act as an environmental trigger for many respiratory illnesses. As a pollutant it is difficult to map accurately, as concentrations can vary greatly over small distances. In this study three geostatistical techniques were compared, producing maps of NO2 concentrations in the United Kingdom (UK). The primary data source for each technique was NO2 point data, generated from background automatic monitoring and background diffusion tubes, which are analysed by different laboratories on behalf of local councils and authorities in the UK. The techniques used were simple kriging (SK), ordinary kriging (OK) and simple kriging with a locally varying mean (SKlm). SK and OK make use of the primary variable only. SKlm differs in that it utilises additional data to inform prediction, and hence potentially reduces uncertainty. The secondary data source was Oxides of Nitrogen (NOx) derived from dispersion modelling outputs, at 1km x 1km resolution for the UK. These data were used to define the locally varying mean in SKlm, using two regression approaches: (i) global regression (GR) and (ii) geographically weighted regression (GWR). Based upon summary statistics and cross-validation prediction errors, SKlm using GWR derived local means produced the most accurate predictions. Therefore, using GWR to inform SKlm was beneficial in this study.

Veja mais

126 resultados para Vector Auto Regression

Filtro por publicador