Biblioteca Digital

958 resultados para multivariate binary data

A method to incorporate the effect of taxonomic uncertainty on multivariate analyses of ecological data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.

The multivariate L1-median and associated data depth

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper gives three related results: (i) a new, simple, fast, monotonically converging algorithm for deriving the L1-median of a data cloud in ℝd, a problem that can be traced to Fermat and has fascinated applied mathematicians for over three centuries; (ii) a new general definition for depth functions, as functions of multivariate medians, so that different definitions of medians will, correspondingly, give rise to different dept functions; and (iii) a simple closed-form formula of the L1-depth function for a given data cloud in ℝd.

Correlation of the liquid–liquid equilibrium data for specific ternary systems with one or two partially miscible binary subsystems

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents the results of a liquid–liquid equilibrium data correlation for 11 ternary systems which have not been previously fitted using the NRTL model or, when they have, the results presented in the literature are inconsistent with the experimental behavior of the system. These ternary systems include mixtures with one or two partially miscible pairs. During the correlation process, new restrictions were imposed on the values for the NRTL binary parameters to ensure correct prediction of the total or partial miscibility for the binary pairs involved. In addition, topological concepts related to the Gibbs stability test have been applied in order to validate the results in the whole range of compositions.

Comparison of Multivariate and Univariate Models for Genetic Evaluation of Milk Yield based on Test Day Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H12, 62P99

Clustering in non-parametric multivariate analyses.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Clustering in non-parametric multivariate analyses.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Multivariate analysis of regional-scale geochemical data for environmental monitoring

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Sociodemographic factors influencing adherence to antenatal iron supplementation recommendations among pregnant women in Malawi: Analysis of data from the 2010 Malawi Demographic and Health Survey

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background and Aim: Maternal morbidity and mortality statistics remain unacceptably high in Malawi. Prominent among the risk factors in the country is anaemia in pregnancy, which generally results from nutritional inadequacy (particularly iron deficiency) and malaria, among other factors. This warrants concerted efforts to increase iron intake among reproductive-age women. This study, among women in Malawi, examined factors determining intake of supplemental iron for at least 90 days during pregnancy. Methods: A weighted sample of 10,750 women (46.7%), from the 23,020 respondents of the 2010 Malawi Demographic and Health Survey (MDHS), were utilized for the study. Univariate, bivariate, and regression techniques were employed. While univariate analysis revealed the percent distributions of all variables, bivariate analysis was used to examine the relationships between individual independent variables and adherence to iron supplementation. Chi-square tests of independence were conducted for categorical variables, with the significance level set at P < 0.05. Two binary logistic regression models were used to evaluate the net effect of independent variables on iron supplementation adherence. Results: Thirty-seven percent of the women adhered to the iron supplementation recommendations during pregnancy. Multivariate analysis indicated that younger age, urban residence, higher education, higher wealth status, and attending antenatal care during the first trimester were significantly associated with increased odds of taking iron supplementation for 90 days or more during pregnancy (P < 0.01). Conclusions: The results indicate low adherence to the World Health Organization’s iron supplementation recommendations among pregnant women in Malawi, and this contributes to negative health outcomes for both mothers and children. Focusing on education interventions that target populations with low rates of iron supplement intake, including campaigns to increase the number of women who attend antenatal care clinics in the first trimester, are recommended to increase adherence to iron supplementation recommendations.

Identifying wind power ramp causes from multivariate datasets: a methodological proposal and its application to reanalysis data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Forecasting abrupt variations in wind power generation (the so-called ramps) helps achieve large scale wind power integration. One of the main issues to be confronted when addressing wind power ramp forecasting is the way in which relevant information is identified from large datasets to optimally feed forecasting models. To this end, an innovative methodology oriented to systematically relate multivariate datasets to ramp events is presented. The methodology comprises two stages: the identification of relevant features in the data and the assessment of the dependence between these features and ramp occurrence. As a test case, the proposed methodology was employed to explore the relationships between atmospheric dynamics at the global/synoptic scales and ramp events experienced in two wind farms located in Spain. The achieved results suggested different connection degrees between these atmospheric scales and ramp occurrence. For one of the wind farms, it was found that ramp events could be partly explained from regional circulations and zonal pressure gradients. To perform a comprehensive analysis of ramp underlying causes, the proposed methodology could be applied to datasets related to other stages of the wind-topower conversion chain.

An application of two techniques for the analysis of short, multivariate non-stationary time-series of Mauritanian trawl survey data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Min/max autocorrelation factor analysis (MAFA) and dynamic factor analysis (DFA) are complementary techniques for analysing short (> 15-25 y), non-stationary, multivariate data sets. We illustrate the two techniques using catch rate (cpue) time-series (1982-2001) for 17 species caught during trawl surveys off Mauritania, with the NAO index, an upwelling index, sea surface temperature, and an index of fishing effort as explanatory variables. Both techniques gave coherent results, the most important common trend being a decrease in cpue during the latter half of the time-series, and the next important being an increase during the first half. A DFA model with SST and UPW as explanatory variables and two common trends gave good fits to most of the cpue time-series. (c) 2004 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.

Identification of deep water lantern sharks (Chondrichthyes : Etmopteridae) using morphometric data and multivariate analysis

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Taxonomic distinction to species level of deep water sharks is complex and often impossible to achieve during fisheries-related studies. The species of the genus Etmopterus are particularly difficult to identify, so they often appear without species assignation as Etmopetrus sp. or spp. in studies, even those focusing on elasmobranchs. During this work, the morphometric traits of two species of Etmopterus, E. spinax and E. pusillus were studied using 27 different morphological measurements, relatively easy to obtain even in the field. These measurements were processed with multivariate analysis in order to find out the most important ones likely to separate the two species. Sexual dimorphism was also assessed using the same techniques, and it was found that it does not occur in these species. The two Etmopterus species presented in this study share the same habitats in the overlapping ranges of distribution and are caught together on the outer shelves and slopes of the north-eastern Atlantic.

Improving multivariate data streams clustering.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.

Comparing Near-infrared Conventional Diffuse Reflectance Spectroscopy And Hyperspectral Imaging For Determination Of The Bulk Properties Of Solid Samples By Multivariate Regression: Determination Of Mooney Viscosity And Plasticity Indices Of Natural Rubber.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventional reflectance spectroscopy (NIRS) and hyperspectral imaging (HI) in the near-infrared region (1000-2500 nm) are evaluated and compared, using, as the case study, the determination of relevant properties related to the quality of natural rubber. Mooney viscosity (MV) and plasticity indices (PI) (PI0 - original plasticity, PI30 - plasticity after accelerated aging, and PRI - the plasticity retention index after accelerated aging) of rubber were determined using multivariate regression models. Two hundred and eighty six samples of rubber were measured using conventional and hyperspectral near-infrared imaging reflectance instruments in the range of 1000-2500 nm. The sample set was split into regression (n = 191) and external validation (n = 95) sub-sets. Three instruments were employed for data acquisition: a line scanning hyperspectral camera and two conventional FT-NIR spectrometers. Sample heterogeneity was evaluated using hyperspectral images obtained with a resolution of 150 × 150 μm and principal component analysis. The probed sample area (5 cm(2); 24,000 pixels) to achieve representativeness was found to be equivalent to the average of 6 spectra for a 1 cm diameter probing circular window of one FT-NIR instrument. The other spectrophotometer can probe the whole sample in only one measurement. The results show that the rubber properties can be determined with very similar accuracy and precision by Partial Least Square (PLS) regression models regardless of whether HI-NIR or conventional FT-NIR produce the spectral datasets. The best Root Mean Square Errors of Prediction (RMSEPs) of external validation for MV, PI0, PI30, and PRI were 4.3, 1.8, 3.4, and 5.3%, respectively. Though the quantitative results provided by the three instruments can be considered equivalent, the hyperspectral imaging instrument presents a number of advantages, being about 6 times faster than conventional bulk spectrometers, producing robust spectral data by ensuring sample representativeness, and minimizing the effect of the presence of contaminants.

Multivariate analysis of the effects of soil parameters and environmental factors on the flavonoid content of leaves of Passiflora incarnata L., Passifloraceae

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the present study was to evaluate the effect of soil characteristics (pH, macro- and micro-nutrients), environmental factors (temperature, humidity, period of the year and time of day of collection) and meteorological conditions (rain, sun, cloud and cloud/rain) on the flavonoid content of leaves of Passiflora incarnata L., Passifloraceae. The total flavonoid contents of leaf samples harvested from plants cultivated or collected under different conditions were quantified by high-performance liquid chromatography with ultraviolet detection (HPLC-UV/PAD). Chemometric treatment of the data by principal component (PCA) and hierarchic cluster analyses (HCA) showed that the samples did not present a specific classification in relation to the environmental and soil variables studied, and that the environmental variables were not significant in describing the data set. However, the levels of the elements Fe, B and Cu present in the soil showed an inverse correlation with the total flavonoid contents of the leaves of P. incarnata.

A binary engine fuelling HD 87643's complex circumstellar environment Determined using AMBER/VLTI imaging

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context. The star HD 87643, exhibiting the ""B[e] phenomenon"", has one of the most extreme infrared excesses for this object class. It harbours a large amount of both hot and cold dust, and is surrounded by an extended reflection nebula. Aims. One of our major goals was to investigate the presence of a companion in HD87643. In addition, the presence of close dusty material was tested through a combination of multi-wavelength high spatial resolution observations. Methods. We observed HD 87643 with high spatial resolution techniques, using the near-IR AMBER/VLTI interferometer with baselines ranging from 60 m to 130 m and the mid-IR MIDI/VLTI interferometer with baselines ranging from 25 m to 65 m. These observations are complemented by NACO/VLT adaptive-optics-corrected images in the K and L-bands, and ESO-2.2m optical Wide-Field Imager large-scale images in the B, V and R-bands. Results. We report the direct detection of a companion to HD 87643 by means of image synthesis using the AMBER/VLTI instrument. The presence of the companion is confirmed by the MIDI and NACO data, although with a lower confidence. The companion is separated by similar to 34 mas with a roughly north-south orientation. The period must be large (several tens of years) and hence the orbital parameters are not determined yet. Binarity with high eccentricity might be the key to interpreting the extreme characteristics of this system, namely a dusty circumstellar envelope around the primary, a compact dust nebulosity around the binary system and a complex extended nebula suggesting past violent ejections.

«
1
2
...
6
7
8
9
10
11
12
...
63
64
»