949 resultados para Multivariate Lifetime Data


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data visualization techniques are powerful in the handling and analysis of multivariate systems. One such technique known as parallel coordinates was used to support the diagnosis of an event, detected by a neural network-based monitoring system, in a boiler at a Brazilian Kraft pulp mill. Its attractiveness is the possibility of the visualization of several variables simultaneously. The diagnostic procedure was carried out step-by-step going through exploratory, explanatory, confirmatory, and communicative goals. This tool allowed the visualization of the boiler dynamics in an easier way, compared to commonly used univariate trend plots. In addition it facilitated analysis of other aspects, namely relationships among process variables, distinct modes of operation and discrepant data. The whole analysis revealed firstly that the period involving the detected event was associated with a transition between two distinct normal modes of operation, and secondly the presence of unusual changes in process variables at this time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An important problem in unsupervised data clustering is how to determine the number of clusters. Here we investigate how this can be achieved in an automated way by using interrelation matrices of multivariate time series. Two nonparametric and purely data driven algorithms are expounded and compared. The first exploits the eigenvalue spectra of surrogate data, while the second employs the eigenvector components of the interrelation matrix. Compared to the first algorithm, the second approach is computationally faster and not limited to linear interrelation measures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Problems due to the lack of data standardization and data management have lead to work inefficiencies for the staff working with the vision data for the Lifetime Surveillance of Astronaut Health. Data has been collected over 50 years in a variety of manners and then entered into a software. The lack of communication between the electronic health record (EHR) form designer, epidemiologists, and optometrists has led to some level to confusion on the capability of the EHR system and how its forms can be designed to fit all the needs of the relevant parties. EHR form customizations or form redesigns were found to be critical for using NASA's EHR system in the most beneficial way for its patients, optometrists, and epidemiologists. In order to implement a protocol, data being collected was examined to find the differences in data collection methods. Changes were implemented through the establishment of a process improvement team (PIT). Based on the findings of the PIT, suggestions have been made to improve the current EHR system. If the suggestions are implemented correctly, this will not only improve efficiency of the staff at NASA and its contractors, but set guidelines for changes in other forms such as the vision exam forms. Because NASA is at the forefront of such research and health surveillance the impact of this management change could have a drastic improvement on the collection of and adaptability of the EHR. Accurate data collection from this 50+ year study is ongoing and is going to help current and future generations understand the implications of space flight on human health. It is imperative that the vast amount of information is documented correctly.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper gives three related results: (i) a new, simple, fast, monotonically converging algorithm for deriving the L1-median of a data cloud in ℝd, a problem that can be traced to Fermat and has fascinated applied mathematicians for over three centuries; (ii) a new general definition for depth functions, as functions of multivariate medians, so that different definitions of medians will, correspondingly, give rise to different dept functions; and (iii) a simple closed-form formula of the L1-depth function for a given data cloud in ℝd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H12, 62P99

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Forecasting abrupt variations in wind power generation (the so-called ramps) helps achieve large scale wind power integration. One of the main issues to be confronted when addressing wind power ramp forecasting is the way in which relevant information is identified from large datasets to optimally feed forecasting models. To this end, an innovative methodology oriented to systematically relate multivariate datasets to ramp events is presented. The methodology comprises two stages: the identification of relevant features in the data and the assessment of the dependence between these features and ramp occurrence. As a test case, the proposed methodology was employed to explore the relationships between atmospheric dynamics at the global/synoptic scales and ramp events experienced in two wind farms located in Spain. The achieved results suggested different connection degrees between these atmospheric scales and ramp occurrence. For one of the wind farms, it was found that ramp events could be partly explained from regional circulations and zonal pressure gradients. To perform a comprehensive analysis of ramp underlying causes, the proposed methodology could be applied to datasets related to other stages of the wind-topower conversion chain.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Min/max autocorrelation factor analysis (MAFA) and dynamic factor analysis (DFA) are complementary techniques for analysing short (> 15-25 y), non-stationary, multivariate data sets. We illustrate the two techniques using catch rate (cpue) time-series (1982-2001) for 17 species caught during trawl surveys off Mauritania, with the NAO index, an upwelling index, sea surface temperature, and an index of fishing effort as explanatory variables. Both techniques gave coherent results, the most important common trend being a decrease in cpue during the latter half of the time-series, and the next important being an increase during the first half. A DFA model with SST and UPW as explanatory variables and two common trends gave good fits to most of the cpue time-series. (c) 2004 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Taxonomic distinction to species level of deep water sharks is complex and often impossible to achieve during fisheries-related studies. The species of the genus Etmopterus are particularly difficult to identify, so they often appear without species assignation as Etmopetrus sp. or spp. in studies, even those focusing on elasmobranchs. During this work, the morphometric traits of two species of Etmopterus, E. spinax and E. pusillus were studied using 27 different morphological measurements, relatively easy to obtain even in the field. These measurements were processed with multivariate analysis in order to find out the most important ones likely to separate the two species. Sexual dimorphism was also assessed using the same techniques, and it was found that it does not occur in these species. The two Etmopterus species presented in this study share the same habitats in the overlapping ranges of distribution and are caught together on the outer shelves and slopes of the north-eastern Atlantic.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.