45 resultados para multivariate
Resumo:
Animal communities are sensitive to environmental disturbance, and several multivariate methods have recently been developed to detect changes in community structure. The complex taxonomy of soil invertebrates constrains the use of the community level in monitoring environmental changes, since species identification requires expertise and time. However, recent literature data on marine communities indicate that little multivariate information is lost in the taxonomic aggregation of species data to high rank taxa. In the present paper, this hypothesis was tested on two oribatid mite (oribatida, Acari) assemblages under two different kinds of disturbance: metal pollution and fires. Results indicate that data sets built at the genus and family systematic rank can detect the effects of disturbance with little loss of information. This is an encouraging result in view of the use of the community level as a preliminary tool for describing patterns of human-disturbed soil ecosystems. (c) 2006 Elsevier SAS. All rights reserved.
Resumo:
This research aims to use the multivariate geochemical dataset, generated by the Tellus project, to investigate the appropriate use of transformation methods to maintain the integrity of geochemical data and inherent constrained behaviour in multivariate relationships. The widely used normal score transform is compared with the use of a stepwise conditional transform technique. The Tellus Project, managed by GSNI and funded by the Department of Enterprise Trade and Development and the EU’s Building Sustainable Prosperity Fund, involves the most comprehensive geological mapping project ever undertaken in Northern Ireland. Previous study has demonstrated spatial variability in the Tellus data but geostatistical analysis and interpretation of the datasets requires use of an appropriate methodology that reproduces the inherently complex multivariate relations. Previous investigation of the Tellus geochemical data has included use of Gaussian-based techniques. However, earth science variables are rarely Gaussian, hence transformation of data is integral to the approach. The multivariate geochemical dataset generated by the Tellus project provides an opportunity to investigate the appropriate use of transformation methods, as required for Gaussian-based geostatistical analysis. In particular, the stepwise conditional transform is investigated and developed for the geochemical datasets obtained as part of the Tellus project. The transform is applied to four variables in a bivariate nested fashion due to the limited availability of data. Simulation of these transformed variables is then carried out, along with a corresponding back transformation to original units. Results show that the stepwise transform is successful in reproducing both univariate statistics and the complex bivariate relations exhibited by the data. Greater fidelity to multivariate relationships will improve uncertainty models, which are required for consequent geological, environmental and economic inferences.
Resumo:
Blind steganalysis of JPEG images is addressed by modeling the correlations among the DCT coefficients using K -variate (K = 2) p.d.f. estimates (p.d.f.s) constructed by means of Markov random field (MRF) cliques. The reasoning of using high variate p.d.f.s together with MRF cliques for image steganalysis is explained via a classical detection problem. Although our approach has many improvements over the current state-of-the-art, it suffers from the high dimensionality and the sparseness of the high variate p.d.f.s. The dimensionality problem as well as the sparseness problem are solved heuristically by means of dimensionality reduction and feature selection algorithms. The detection accuracy of the proposed method(s) is evaluated over Memon's (30.000 images) and Goljan's (1912 images) image sets. It is shown that practically applicable steganalysis systems are possible with a suitable dimensionality reduction technique and these systems can provide, in general, improved detection accuracy over the current state-of-the-art. Experimental results also justify this assertion.
Resumo:
The techniques of principal component analysis (PCA) and partial least squares (PLS) are introduced from the point of view of providing a multivariate statistical method for modelling process plants. The advantages and limitations of PCA and PLS are discussed from the perspective of the type of data and problems that might be encountered in this application area. These concepts are exemplified by two case studies dealing first with data from a continuous stirred tank reactor (CSTR) simulation and second a literature source describing a low-density polyethylene (LDPE) reactor simulation.
Resumo:
Motivation: To date, Gene Set Analysis (GSA) approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations, or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes.
Results: In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA, indeed, captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses.
Resumo:
We examined variability in hierarchical beta diversity across ecosystems, geographical gradients, and organism groups using multivariate spatial mixed modeling analysis of two independent data sets. The larger data set comprised reported ratios of regional species richness (RSR) to local species richness (LSR) and the second data set consisted of RSR: LSR ratios derived from nested species-area relationships. There was a negative, albeit relatively weak, relationship between beta diversity and latitude. We found only relatively subtle differences in beta diversity among the realms, yet beta diversity was lower in marine systems than in terrestrial or freshwater realms. Beta diversity varied significantly among organisms' major characteristics such as body mass, trophic position, and dispersal type in the larger data set. Organisms that disperse via seeds had highest beta diversity, and passively dispersed organisms showed the lowest beta diversity. Furthermore, autotrophs had lower beta diversity than organisms higher up the food web; omnivores and carnivores had consistently higher beta diversity. This is evidence that beta diversity is simultaneously controlled by extrinsic factors related to geography and environment, and by intrinsic factors related to organism characteristics.
Resumo:
The statistical properties of the multivariate GammaGamma (ΓΓ) distribution with arbitrary correlation have remained unknown. In this paper, we provide analytical expressions for the joint probability density function (PDF), cumulative distribution function (CDF) and moment generation function of the multivariate ΓΓ distribution with arbitrary correlation. Furthermore, we present novel approximating expressions for the PDF and CDF of the su m of ΓΓ random variables with arbitrary correlation. Based on this statistical analysis, we investigate the performance of radio frequency and optical wireless communication systems. It is noteworthy that the presented expressions include several previous results in the literature as special cases.
Resumo:
Slow release drugs must be manufactured to meet target specifications with respect to dissolution curve profiles. In this paper we consider the problem of identifying the drivers of dissolution curve variability of a drug from historical manufacturing data. Several data sources are considered: raw material parameters, coating data, loss on drying and pellet size statistics. The methodology employed is to develop predictive models using LASSO, a powerful machine learning algorithm for regression with high-dimensional datasets. LASSO provides sparse solutions facilitating the identification of the most important causes of variability in the drug fabrication process. The proposed methodology is illustrated using manufacturing data for a slow release drug.
Resumo:
Biodegradable polymers, such as PLA (Polylactide), come from renewable resources like corn starch and if disposed of correctly, degrade and become harmless to the ecosystem making them attractive alternatives to petroleum based polymers. PLA in particular is used in a variety of applications including medical devices, food packaging and waste disposal packaging. However, the industry faces challenges in melt processing of PLA due to its poor thermal stability which is influenced by processing temperatures and shearing.
Identification and control of suitable processing conditions is extremely challenging, usually relying on trial and error, and often sensitive to batch to batch variations. Off-line assessment in a lab environment can result in high scrap rates, long lead times and lengthy and expensive process development. Scrap rates are typically in the region of 25-30% for medical grade PLA costing between €2000-€5000/kg.
Additives are used to enhance material properties such as mechanical properties and may also have a therapeutic role in the case of bioresorbable medical devices, for example the release of calcium from orthopaedic implants such as fixation screws promotes healing. Additives can also reduce the costs involved as less of the polymer resin is required.
This study investigates the scope for monitoring, modelling and optimising processing conditions for twin screw extrusion of PLA and PLA w/calcium carbonate to achieve desired material properties. A DAQ system has been constructed to gather data from a bespoke measurement die comprising melt temperature; pressure drop along the length of the die; and UV-Vis spectral data which is shown to correlate to filler dispersion. Trials were carried out under a range of processing conditions using a Design of Experiments approach and samples were tested for mechanical properties, degradation rate and the release rate of calcium. Relationships between recorded process data and material characterisation results are explored.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.