159 resultados para multivariate classification
Resumo:
Multivariate classification techniques have proven to be powerful tools for distinguishing experimental conditions in single sessions of functional magnetic resonance imaging (fMRI) data. But they are vulnerable to a considerable penalty in classification accuracy when applied across sessions or participants, calling into question the degree to which fine-grained encodings are shared across subjects. Here, we introduce joint learning techniques, where feature selection is carried out using a held-out subset of a target dataset, before training a linear classifier on a source dataset. Single trials of functional MRI data from a covert property generation task are classified with regularized regression techniques to predict the semantic class of stimuli. With our selection techniques (joint ranking feature selection (JRFS) and disjoint feature selection (DJFS)), classification performance during cross-session prediction improved greatly, relative to feature selection on the source session data only. Compared with JRFS, DJFS showed significant improvements for cross-participant classification. And when using a groupwise training, DJFS approached the accuracies seen for prediction across different sessions from the same participant. Comparing several feature selection strategies, we found that a simple univariate ANOVA selection technique or a minimal searchlight (one voxel in size) is appropriate, compared with larger searchlights.
Resumo:
Blind steganalysis of JPEG images is addressed by modeling the correlations among the DCT coefficients using K -variate (K = 2) p.d.f. estimates (p.d.f.s) constructed by means of Markov random field (MRF) cliques. The reasoning of using high variate p.d.f.s together with MRF cliques for image steganalysis is explained via a classical detection problem. Although our approach has many improvements over the current state-of-the-art, it suffers from the high dimensionality and the sparseness of the high variate p.d.f.s. The dimensionality problem as well as the sparseness problem are solved heuristically by means of dimensionality reduction and feature selection algorithms. The detection accuracy of the proposed method(s) is evaluated over Memon's (30.000 images) and Goljan's (1912 images) image sets. It is shown that practically applicable steganalysis systems are possible with a suitable dimensionality reduction technique and these systems can provide, in general, improved detection accuracy over the current state-of-the-art. Experimental results also justify this assertion.
Resumo:
Sediment particle size analysis (PSA) is routinely used to support benthic macrofaunal community distribution data in habitat mapping and Ecological Status (ES) assessment. No optimal PSA Method to explain variability in multivariate macrofaunal distribution has been identified nor have the effects of changing sampling strategy been examined. Here, we use benthic macrofaunal and PSA grabs from two embayments in the south of Ireland. Four frequently used PSA Methods and two common sampling strategies are applied. A combination of laser particle sizing and wet/dry sieving without peroxide pre-treatment to remove organics was identified as the optimal Method for explaining macrofaunal distributions. ES classifications and EUNIS sediment classification were robust to changes in PSA Method. Fauna and PSA samples returned from the same grab sample significantly decreased macrofaunal variance explained by PSA and caused ES to be classified as lower. Employing the optimal PSA Method and sampling strategy will improve benthic monitoring. © 2012 Elsevier Ltd.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
This paper presents a statistical-based fault diagnosis scheme for application to internal combustion engines. The scheme relies on an identified model that describes the relationships between a set of recorded engine variables using principal component analysis (PCA). Since combustion cycles are complex in nature and produce nonlinear relationships between the recorded engine variables, the paper proposes the use of nonlinear PCA (NLPCA). The paper further justifies the use of NLPCA by comparing the model accuracy of the NLPCA model with that of a linear PCA model. A new nonlinear variable reconstruction algorithm and bivariate scatter plots are proposed for fault isolation, following the application of NLPCA. The proposed technique allows the diagnosis of different fault types under steady-state operating conditions. More precisely, nonlinear variable reconstruction can remove the fault signature from the recorded engine data, which allows the identification and isolation of the root cause of abnormal engine behaviour. The paper shows that this can lead to (i) an enhanced identification of potential root causes of abnormal events and (ii) the masking of faulty sensor readings. The effectiveness of the enhanced NLPCA based monitoring scheme is illustrated by its application to a sensor fault and a process fault. The sensor fault relates to a drift in the fuel flow reading, whilst the process fault relates to a partial blockage of the intercooler. These faults are introduced to a Volkswagen TDI 1.9 Litre diesel engine mounted on an experimental engine test bench facility.
Resumo:
This paper points out a serious flaw in dynamic multivariate statistical process control (MSPC). The principal component analysis of a linear time series model that is employed to capture auto- and cross-correlation in recorded data may produce a considerable number of variables to be analysed. To give a dynamic representation of the data (based on variable correlation) and circumvent the production of a large time-series structure, a linear state space model is used here instead. The paper demonstrates that incorporating a state space model, the number of variables to be analysed dynamically can be considerably reduced, compared to conventional dynamic MSPC techniques.
Resumo:
The work in this paper is of particular significance since it considers the problem of modelling cross- and auto-correlation in statistical process monitoring. The presence of both types of correlation can lead to fault insensitivity or false alarms, although in published literature to date, only autocorrelation has been broadly considered. The proposed method, which uses a Kalman innovation model, effectively removes both correlations. The paper (and Part 2 [2]) has emerged from work supported by EPSRC grant GR/S84354/01 and is of direct relevance to problems in several application areas including chemical, electrical, and mechanical process monitoring.
Resumo:
This paper builds on work presented in the first paper, Part 1 [1] and is of equal significance. The paper proposes a novel compensation method to preserve the integrity of step-fault signatures prevalent in various processes that can be masked during the removal of both auto- and cross correlation. Using industrial data, the paper demonstrates the benefit of the proposed method, which is applicable to chemical, electrical, and mechanical process monitoring. This paper, (and Part 1 [1]), has led to further work supported by EPSRC grant GR/S84354/01 involving kernel PCA methods.
Resumo:
Previous studies have revealed considerable interobserver and intraobserver variation in the histological classification of preinvasive cervical squamous lesions. The aim of the present study was to develop a decision support system (DSS) for the histological interpretation of these lesions. Knowledge and uncertainty were represented in the form of a Bayesian belief network that permitted the storage of diagnostic knowledge and, for a given case, the collection of evidence in a cumulative manner that provided a final probability for the possible diagnostic outcomes. The network comprised 8 diagnostic histological features (evidence nodes) that were each independently linked to the diagnosis (decision node) by a conditional probability matrix. Diagnostic outcomes comprised normal; koilocytosis; and cervical intraepithelial neoplasia (CIN) 1, CIN II, and CIN M. For each evidence feature, a set of images was recorded that represented the full spectrum of change for that feature. The system was designed to be interactive in that the histopathologist was prompted to enter evidence into the network via a specifically designed graphical user interface (i-Path Diagnostics, Belfast, Northern Ireland). Membership functions were used to derive the relative likelihoods for the alternative feature outcomes, the likelihood vector was entered into the network, and the updated diagnostic belief was computed for the diagnostic outcomes and displayed. A cumulative probability graph was generated throughout the diagnostic process and presented on screen. The network was tested on 50 cervical colposcopic biopsy specimens, comprising 10 cases each of normal, koilocytosis, CIN 1, CIN H, and CIN III. These had been preselected by a consultant gynecological pathologist. Using conventional morphological assessment, the cases were classified on 2 separate occasions by 2 consultant and 2 junior pathologists. The cases were also then classified using the DSS on 2 occasions by the 4 pathologists and by 2 medical students with no experience in cervical histology. Interobserver and intraobserver agreement using morphology and using the DSS was calculated with K statistics. Intraobserver reproducibility using conventional unaided diagnosis was reasonably good (kappa range, 0.688 to 0.861), but interobserver agreement was poor (kappa range, 0.347 to 0.747). Using the DSS improved overall reproducibility between individuals. Using the DSS, however, did not enhance the diagnostic performance of junior pathologists when comparing their DSS-based diagnosis against an experienced consultant. However, the generation of a cumulative probability graph also allowed a comparison of individual performance, how individual features were assessed in the same case, and how this contributed to diagnostic disagreement between individuals. Diagnostic features such as nuclear pleomorphism were shown to be particularly problematic and poorly reproducible. DSSs such as this therefore not only have a role to play in enhancing decision making but also in the study of diagnostic protocol, education, self-assessment, and quality control. (C) 2003 Elsevier Inc. All rights reserved.