952 resultados para Core data set


Relevância:

90.00% 90.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Developments in the statistical analysis of compositional data over the last two decades have made possible a much deeper exploration of the nature of variability, and the possible processes associated with compositional data sets from many disciplines. In this paper we concentrate on geochemical data sets. First we explain how hypotheses of compositional variability may be formulated within the natural sample space, the unit simplex, including useful hypotheses of subcompositional discrimination and specific perturbational change. Then we develop through standard methodology, such as generalised likelihood ratio tests, statistical tools to allow the systematic investigation of a complete lattice of such hypotheses. Some of these tests are simple adaptations of existing multivariate tests but others require special construction. We comment on the use of graphical methods in compositional data analysis and on the ordination of specimens. The recent development of the concept of compositional processes is then explained together with the necessary tools for a staying- in-the-simplex approach, namely compositional singular value decompositions. All these statistical techniques are illustrated for a substantial compositional data set, consisting of 209 major-oxide and rare-element compositions of metamorphosed limestones from the Northeast and Central Highlands of Scotland. Finally we point out a number of unresolved problems in the statistical analysis of compositional processes

Relevância:

90.00% 90.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Seafloor imagery is a rich source of data for the study of biological and geological processes. Among several applications, still images of the ocean floor can be used to build image composites referred to as photo-mosaics. Photo-mosaics provide a wide-area visual representation of the benthos, and enable applications as diverse as geological surveys, mapping and detection of temporal changes in the morphology of biodiversity. We present an approach for creating globally aligned photo-mosaics using 3D position estimates provided by navigation sensors available in deep water surveys. Without image registration, such navigation data does not provide enough accuracy to produce useful composite images. Results from a challenging data set of the Lucky Strike vent field at the Mid Atlantic Ridge are reported

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data assimilation provides techniques for combining observations and prior model forecasts to create initial conditions for numerical weather prediction (NWP). The relative weighting assigned to each observation in the analysis is determined by its associated error. Remote sensing data usually has correlated errors, but the correlations are typically ignored in NWP. Here, we describe three approaches to the treatment of observation error correlations. For an idealized data set, the information content under each simplified assumption is compared with that under correct correlation specification. Treating the errors as uncorrelated results in a significant loss of information. However, retention of an approximated correlation gives clear benefits.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A methodology for using remotely sensed data to both generate and evaluate a hydraulic model of floodplain inundation is presented for a rural case study in the United Kingdom: Upton-upon-Severn. Remotely sensed data have been processed and assembled to provide an excellent test data set for both model construction and validation. In order to assess the usefulness of the data and the issues encountered in their use, two models for floodplain inundation were constructed: one based on an industry standard one-dimensional approach and the other based on a simple two-dimensional approach. The results and their implications for the future use of remotely sensed data for predicting flood inundation are discussed. Key conclusions for the use of remotely sensed data are that care must be taken to integrate different data sources for both model construction and validation and that improvements in ground height data shift the focus in terms of model uncertainties to other sources such as boundary conditions. The differences between the two models are found to be of minor significance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate the “flux excess” effect, whereby open solar flux estimates from spacecraft increase with increasing heliocentric distance. We analyze the kinematic effect on these open solar flux estimates of large-scale longitudinal structure in the solar wind flow, with particular emphasis on correcting estimates made using data from near-Earth satellites. We show that scatter, but no net bias, is introduced by the kinematic “bunching effect” on sampling and that this is true for both compression and rarefaction regions. The observed flux excesses, as a function of heliocentric distance, are shown to be consistent with open solar flux estimates from solar magnetograms made using the potential field source surface method and are well explained by the kinematic effect of solar wind speed variations on the frozen-in heliospheric field. Applying this kinematic correction to the Omni-2 interplanetary data set shows that the open solar flux at solar minimum fell from an annual mean of 3.82 × 1016 Wb in 1987 to close to half that value (1.98 × 1016 Wb) in 2007, making the fall in the minimum value over the last two solar cycles considerably faster than the rise inferred from geomagnetic activity observations over four solar cycles in the first half of the 20th century.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

1. Habitat fragmentation can affect pollinator and plant population structure in terms of species composition, abundance, area covered and density of flowering plants. This, in turn, may affect pollinator visitation frequency, pollen deposition, seed set and plant fitness. 2. A reduction in the quantity of flower visits can be coupled with a reduction in the quality of pollination service and hence the plants’ overall reproductive success and long-term survival. Understanding the relationship between plant population size and⁄ or isolation and pollination limitation is of fundamental importance for plant conservation. 3. Weexamined flower visitation and seed set of 10 different plant species fromfive European countries to investigate the general effects of plant populations size and density, both within (patch level) and between populations (population level), on seed set and pollination limitation. 4. Wefound evidence that the effects of area and density of flowering plant assemblages were generally more pronounced at the patch level than at the population level. We also found that patch and population level together influenced flower visitation and seed set, and the latter increased with increasing patch area and density, but this effect was only apparent in small populations. 5. Synthesis. By using an extensive pan-European data set on flower visitation and seed set we have identified a general pattern in the interplay between the attractiveness of flowering plant patches for pollinators and density dependence of flower visitation, and also a strong plant species-specific response to habitat fragmentation effects. This can guide efforts to conserve plant–pollinator interactions, ecosystem functioning and plant fitness in fragmented habitats.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

During the past 15 years, a number of initiatives have been undertaken at national level to develop ocean forecasting systems operating at regional and/or global scales. The co-ordination between these efforts has been organized internationally through the Global Ocean Data Assimilation Experiment (GODAE). The French MERCATOR project is one of the leading participants in GODAE. The MERCATOR systems routinely assimilate a variety of observations such as multi-satellite altimeter data, sea-surface temperature and in situ temperature and salinity profiles, focusing on high-resolution scales of the ocean dynamics. The assimilation strategy in MERCATOR is based on a hierarchy of methods of increasing sophistication including optimal interpolation, Kalman filtering and variational methods, which are progressively deployed through the Syst`eme d’Assimilation MERCATOR (SAM) series. SAM-1 is based on a reduced-order optimal interpolation which can be operated using ‘altimetry-only’ or ‘multi-dataset-ups; it relies on the concept of separability, assuming that the correlations can be separated into a product of horizontal and vertical contributions. The second release, SAM-2, is being developed to include new features from the singular evolutive extended Kalman (SEEK) filter, such as three-dimensional, multivariate error modes and adaptivity schemes. The third one, SAM-3, considers variational methods such as the incremental four-dimensional variational algorithm. Most operational forecasting systems evaluated during GODAE are based on least-squares statistical estimation assuming Gaussian errors. In the framework of the EU MERSEA (Marine EnviRonment and Security for the European Area) project, research is being conducted to prepare the next-generation operational ocean monitoring and forecasting systems. The research effort will explore nonlinear assimilation formulations to overcome limitations of the current systems. This paper provides an overview of the developments conducted in MERSEA with the SEEK filter, the Ensemble Kalman filter and the sequential importance re-sampling filter.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (similar to30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.