30 resultados para multivariate data

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Unlabelled single- and double-stranded DNA (ssDNA and dsDNA, respectively) has been detected at concentrations =10-9?M by surface-enhanced Raman spectroscopy. Under appropriate conditions the sequences spontaneously adsorbed to the surface of both Ag and Au colloids through their nucleobases; this allowed highly reproducible spectra with good signal-to-noise ratios to be recorded on completely unmodified samples. This eliminated the need to promote absorption by introducing external linkers, such as thiols. The spectra of model ssDNA sequences contained bands of all the bases present and showed systematic changes when the overall base composition was altered. Initial tests also showed that small but reproducible changes could be detected between oligonucleotides with the same bases arranged in a different order. The spectra of five ssDNA sequences that correspond to different strains of the Escherichia coli bacterium were found to be sufficiently composition-dependent so that they could be differentiated without the need for any advanced multivariate data analysis techniques.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the context of products from certain regions or countries being banned because of an identified or non-identified hazard, proof of geographical origin is essential with regard to feed and food safety issues. Usually, the product labeling of an affected feed lot shows origin, and the paper documentation shows traceability. Incorrect product labeling is common in embargo situations, however, and alternative analytical strategies for controlling feed authenticity are therefore needed. In this study, distillers' dried grains and solubles (DDGS) were chosen as the product on which to base a comparison of analytical strategies aimed at identifying the most appropriate one. Various analytical techniques were investigated for their ability to authenticate DDGS, including spectroscopic and spectrometric techniques combined with multivariate data analysis, as well as proven techniques for authenticating food, such as DNA analysis and stable isotope ratio analysis. An external validation procedure (called the system challenge) was used to analyze sample sets blind and to compare analytical techniques. All the techniques were adapted so as to be applicable to the DDGS matrix. They produced positive results in determining the botanical origin of DDGS (corn vs. wheat), and several of them were able to determine the geographical origin of the DDGS in the sample set. The maintenance and extension of the databanks generated in this study through the analysis of new authentic samples from a single location are essential in order to monitor developments and processing that could affect authentication.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Honey is a high value food commodity with recognized nutraceutical properties. A primary driver of the value of honey is its floral origin. The feasibility of applying multivariate data analysis to various chemical parameters for the discrimination of honeys was explored. This approach was applied to four authentic honeys with different floral origins (rata, kamahi, clover and manuka) obtained from producers in New Zealand. Results from elemental profiling, stable isotope analysis, metabolomics (UPLC-QToF MS), and NIR, FT-IR, and Raman spectroscopic fingerprinting were analyzed. Orthogonal partial least square discriminant analysis (OPLS-DA) was used to determine which technique or combination of techniques provided the best classification and prediction abilities. Good prediction values were achieved using metabolite data (for all four honeys, Q2 = 0.52; for manuka and clover, Q2 = 0.76) and the trace element/isotopic data (for manuka and clover, Q2 = 0.65), while the other chemical parameters showed promise when combined (for manuka and clover, Q2 = 0.43).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Spectral signal intensities, especially in 'real-world' applications with nonstandardized sample presentation due to uncontrolled variables/factors, commonly require additional spectral processing to normalize signal intensity in an effective way. In this study, we have demonstrated the complexity of choosing a normalization routine in the presence of multiple spectrally distinct constituents by probing a dataset of Raman spectra. Variation in absolute signal intensity (90.1% of total variance) of the Raman spectra of these complex biological samples swamps the variation in useful signals (9.4% of total variance), degrading its diagnostic and evaluative potential.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This research aims to use the multivariate geochemical dataset, generated by the Tellus project, to investigate the appropriate use of transformation methods to maintain the integrity of geochemical data and inherent constrained behaviour in multivariate relationships. The widely used normal score transform is compared with the use of a stepwise conditional transform technique. The Tellus Project, managed by GSNI and funded by the Department of Enterprise Trade and Development and the EU’s Building Sustainable Prosperity Fund, involves the most comprehensive geological mapping project ever undertaken in Northern Ireland. Previous study has demonstrated spatial variability in the Tellus data but geostatistical analysis and interpretation of the datasets requires use of an appropriate methodology that reproduces the inherently complex multivariate relations. Previous investigation of the Tellus geochemical data has included use of Gaussian-based techniques. However, earth science variables are rarely Gaussian, hence transformation of data is integral to the approach. The multivariate geochemical dataset generated by the Tellus project provides an opportunity to investigate the appropriate use of transformation methods, as required for Gaussian-based geostatistical analysis. In particular, the stepwise conditional transform is investigated and developed for the geochemical datasets obtained as part of the Tellus project. The transform is applied to four variables in a bivariate nested fashion due to the limited availability of data. Simulation of these transformed variables is then carried out, along with a corresponding back transformation to original units. Results show that the stepwise transform is successful in reproducing both univariate statistics and the complex bivariate relations exhibited by the data. Greater fidelity to multivariate relationships will improve uncertainty models, which are required for consequent geological, environmental and economic inferences.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a statistical-based fault diagnosis scheme for application to internal combustion engines. The scheme relies on an identified model that describes the relationships between a set of recorded engine variables using principal component analysis (PCA). Since combustion cycles are complex in nature and produce nonlinear relationships between the recorded engine variables, the paper proposes the use of nonlinear PCA (NLPCA). The paper further justifies the use of NLPCA by comparing the model accuracy of the NLPCA model with that of a linear PCA model. A new nonlinear variable reconstruction algorithm and bivariate scatter plots are proposed for fault isolation, following the application of NLPCA. The proposed technique allows the diagnosis of different fault types under steady-state operating conditions. More precisely, nonlinear variable reconstruction can remove the fault signature from the recorded engine data, which allows the identification and isolation of the root cause of abnormal engine behaviour. The paper shows that this can lead to (i) an enhanced identification of potential root causes of abnormal events and (ii) the masking of faulty sensor readings. The effectiveness of the enhanced NLPCA based monitoring scheme is illustrated by its application to a sensor fault and a process fault. The sensor fault relates to a drift in the fuel flow reading, whilst the process fault relates to a partial blockage of the intercooler. These faults are introduced to a Volkswagen TDI 1.9 Litre diesel engine mounted on an experimental engine test bench facility.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper points out a serious flaw in dynamic multivariate statistical process control (MSPC). The principal component analysis of a linear time series model that is employed to capture auto- and cross-correlation in recorded data may produce a considerable number of variables to be analysed. To give a dynamic representation of the data (based on variable correlation) and circumvent the production of a large time-series structure, a linear state space model is used here instead. The paper demonstrates that incorporating a state space model, the number of variables to be analysed dynamically can be considerably reduced, compared to conventional dynamic MSPC techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper builds on work presented in the first paper, Part 1 [1] and is of equal significance. The paper proposes a novel compensation method to preserve the integrity of step-fault signatures prevalent in various processes that can be masked during the removal of both auto- and cross correlation. Using industrial data, the paper demonstrates the benefit of the proposed method, which is applicable to chemical, electrical, and mechanical process monitoring. This paper, (and Part 1 [1]), has led to further work supported by EPSRC grant GR/S84354/01 involving kernel PCA methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Raman spectroscopy has been used to predict the abundance of the FA in clarified butterfat that was obtained from dairy cows fed a range of levels of rapeseed oil in their diet. Partial least squares regression of the Raman spectra against FA compositions obtained by GC showed good prediction for the five major (abundance >5%) FA with R-2=0.74-0.92 and a root mean SE of prediction (RMSEP) that was 5-7% of the mean. In general, the prediction accuracy fell with decreasing abundance in the sample, but the RMSEP was 1.25%. The Raman method has the best prediction ability for unsaturated FA (R-2=0.85-0.92), and in particular trans unsaturated FA (best-predicted FA was 18:1 tDelta9). This enhancement was attributed to the isolation of the unsaturated modes from the saturated modes and the significantly higher spectral response of unsaturated bonds compared with saturated bonds. Raman spectra of the melted butter samples could also be used to predict bulk parameters calculated from standard analyzes, such as iodine value (R-2=0.80) and solid fat content at low temperature (R-2=0.87). For solid fat contents determined at higher temperatures, the prediction ability was significantly reduced (R-2=0.42), and this decrease in performance was attributed to the smaller range of values in solid fat content at the higher temperatures. Finally, although the prediction errors for the abundances of each of the FA in a given sample are much larger with Raman than with full GC analysis, the accuracy is acceptably high for quality control applications. This, combined with the fact that Raman spectra can be obtained with no sample preparation and with 60-s data collection times, means that high-throughput, on-line Raman analysis of butter samples should be possible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Treasure et al. (2004) recently proposed a new sub space-monitoring technique, based on the N4SID algorithm, within the multivariate statistical process control framework. This dynamic-monitoring method requires considerably fewer variables to be analysed when compared with dynamic principal component analysis (PCA). The contribution charts and variable reconstruction, traditionally employed for static PCA, are analysed in a dynamic context. The contribution charts and variable reconstruction may be affected by the ratio of the number of retained components to the total number of analysed variables. Particular problems arise if this ratio is large and a new reconstruction chart is introduced to overcome these. The utility of such a dynamic contribution chart and variable reconstruction is shown in a simulation and by application to industrial data from a distillation unit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Subspace monitoring has recently been proposed as a condition monitoring tool that requires considerably fewer variables to be analysed compared to dynamic principal component analysis (PCA). This paper analyses subspace monitoring in identifying and isolating fault conditions, which reveals that the existing work suffers from inherent limitations if complex fault senarios arise. Based on the assumption that the fault signature is deterministic while the monitored variables are stochastic, the paper introduces a regression-based reconstruction technique to overcome these limitations. The utility of the proposed fault identification and isolation method is shown using a simulation example and the analysis of experimental data from an industrial reactive distillation unit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses.

Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Although it is a known predictor of mortality, there is a relative lack of recent information about anaemia in kidney transplant recipients. Thus, we now report data about the prevalence and management of post-transplant anaemia (PTA) in Europe 5 years after the TRansplant European Survey on Anemia Management (TRESAM) study. Methods: In a cross-sectional study enrolling the largest number of patients to date, data were obtained from 5,834 patients followed at 10 outpatient transplant clinics in four European countries using the American Society of Transplantation anaemia guideline. Results: More than one third (42%) of the patients were anaemic. The haemoglobin (Hb) concentration was significantly correlated with the estimated glomerular filtration rate (eGFR) (r = 0.4, p < 0.001). In multivariate analysis, eGFR, serum ferritin, age, gender, time since transplantation and centres were independently and significantly associated with Hb. Only 24% of the patients who had a Hb concentration <110 g/l were treated with an erythropoiesis-stimulating agent. The prevalence of anaemia and also the use of erythropoiesis-stimulating agents were significantly different across the different centres, suggesting substantial practice variations. Conclusions: PTA is still common and under-treated. The prevalence and management of PTA have not changed substantially since the TRESAM survey.