137 resultados para feature vector
Resumo:
Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.
Resumo:
An orthogonal vector approach is proposed for the synthesis of multi-beam directional modulation (DM) transmitters. These systems have the capability of concurrently projecting independent data streams into different specified spatial directions while simultaneously distorting signal constellations in all other directions. Simulated bit error rate (BER) spatial distributions are presented for various multi-beam system configurations in order to illustrate representative examples of physical layer security performance enhancement that can be achieved.
Resumo:
Multivariate classification techniques have proven to be powerful tools for distinguishing experimental conditions in single sessions of functional magnetic resonance imaging (fMRI) data. But they are vulnerable to a considerable penalty in classification accuracy when applied across sessions or participants, calling into question the degree to which fine-grained encodings are shared across subjects. Here, we introduce joint learning techniques, where feature selection is carried out using a held-out subset of a target dataset, before training a linear classifier on a source dataset. Single trials of functional MRI data from a covert property generation task are classified with regularized regression techniques to predict the semantic class of stimuli. With our selection techniques (joint ranking feature selection (JRFS) and disjoint feature selection (DJFS)), classification performance during cross-session prediction improved greatly, relative to feature selection on the source session data only. Compared with JRFS, DJFS showed significant improvements for cross-participant classification. And when using a groupwise training, DJFS approached the accuracies seen for prediction across different sessions from the same participant. Comparing several feature selection strategies, we found that a simple univariate ANOVA selection technique or a minimal searchlight (one voxel in size) is appropriate, compared with larger searchlights.
Resumo:
This paper presents the results of an investigation into the utility of remote sensing (RS) using meteorological satellites sensors and spatial interpolation (SI) of data from meteorological stations, for the prediction of spatial variation in monthly climate across continental Africa in 1990. Information from the Advanced Very High Resolution Radiometer (AVHRR) of the National Oceanic and Atmospheric Administration's (NOAA) polar-orbiting meteorological satellites was used to estimate land surface temperature (LST) and atmospheric moisture. Cold cloud duration (CCD) data derived from the High Resolution Radiometer (HRR) onboard the European Meteorological Satellite programme's (EUMETSAT) Meteosat satellite series were also used as a RS proxy measurement of rainfall. Temperature, atmospheric moisture and rainfall surfaces were independently derived from SI of measurements from the World Meteorological Organization (WMO) member stations of Africa. These meteorological station data were then used to test the accuracy of each methodology, so that the appropriateness of the two techniques for epidemiological research could be compared. SI was a more accurate predictor of temperature, whereas RS provided a better surrogate for rainfall; both were equally accurate at predicting atmospheric moisture. The implications of these results for mapping short and long-term climate change and hence their potential for the study anti control of disease vectors are considered. Taking into account logistic and analytical problems, there were no clear conclusions regarding the optimality of either technique, but there was considerable potential for synergy.
Resumo:
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.