956 resultados para Data Sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The common GIS-based approach to regional analyses of soil organic carbon (SOC) stocks and changes is to define geographic layers for which unique sets of driving variables are derived, which include land use, climate, and soils. These GIS layers, with their associated attribute data, can then be fed into a range of empirical and dynamic models. Common methodologies for collating and formatting regional data sets on land use, climate, and soils were adopted for the project Assessment of Soil Organic Carbon Stocks and Changes at National Scale (GEFSOC). This permitted the development of a uniform protocol for handling the various input for the dynamic GEFSOC Modelling System. Consistent soil data sets for Amazon-Brazil, the Indo-Gangetic Plains (IGP) of India, Jordan and Kenya, the case study areas considered in the GEFSOC project, were prepared using methodologies developed for the World Soils and Terrain Database (SOTER). The approach involved three main stages: (1) compiling new soil geographic and attribute data in SOTER format; (2) using expert estimates and common sense to fill selected gaps in the measured or primary data; (3) using a scheme of taxonomy-based pedotransfer rules and expert-rules to derive soil parameter estimates for similar soil units with missing soil analytical data. The most appropriate approach varied from country to country, depending largely on the overall accessibility and quality of the primary soil data available in the case study areas. The secondary SOTER data sets discussed here are appropriate for a wide range of environmental applications at national scale. These include agro-ecological zoning, land evaluation, modelling of soil C stocks and changes, and studies of soil vulnerability to pollution. Estimates of national-scale stocks of SOC, calculated using SOTER methods, are presented as a first example of database application. Independent estimates of SOC stocks are needed to evaluate the outcome of the GEFSOC Modelling System for current conditions of land use and climate. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sub-seasonal variability including equatorial waves significantly influence the dehydration and transport processes in the tropical tropopause layer (TTL). This study investigates the wave activity in the TTL in 7 reanalysis data sets (RAs; NCEP1, NCEP2, ERA40, ERA-Interim, JRA25, MERRA, and CFSR) and 4 chemistry climate models (CCMs; CCSRNIES, CMAM, MRI, and WACCM) using the zonal wave number-frequency spectral analysis method with equatorially symmetric-antisymmetric decomposition. Analyses are made for temperature and horizontal winds at 100 hPa in the RAs and CCMs and for outgoing longwave radiation (OLR), which is a proxy for convective activity that generates tropopause-level disturbances, in satellite data and the CCMs. Particular focus is placed on equatorial Kelvin waves, mixed Rossby-gravity (MRG) waves, and the Madden-Julian Oscillation (MJO). The wave activity is defined as the variance, i.e., the power spectral density integrated in a particular zonal wave number-frequency region. It is found that the TTL wave activities show significant difference among the RAs, ranging from ∼0.7 (for NCEP1 and NCEP2) to ∼1.4 (for ERA-Interim, MERRA, and CFSR) with respect to the averages from the RAs. The TTL activities in the CCMs lie generally within the range of those in the RAs, with a few exceptions. However, the spectral features in OLR for all the CCMs are very different from those in the observations, and the OLR wave activities are too low for CCSRNIES, CMAM, and MRI. It is concluded that the broad range of wave activity found in the different RAs decreases our confidence in their validity and in particular their value for validation of CCM performance in the TTL, thereby limiting our quantitative understanding of the dehydration and transport processes in the TTL.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variability in the strength of the stratospheric Lagrangian mean meridional or Brewer-Dobson circulation and horizontal mixing into the tropics over the past three decades are examined using observations of stratospheric mean age of air and ozone. We use a simple representation of the stratosphere, the tropical leaky pipe (TLP) model, guided by mean meridional circulation and horizontal mixing changes in several reanalyses data sets and chemistry climate model (CCM) simulations, to help elucidate reasons for the observed changes in stratospheric mean age and ozone. We find that the TLP model is able to accurately simulate multiyear variability in ozone following recent major volcanic eruptions and the early 2000s sea surface temperature changes, as well as the lasting impact on mean age of relatively short-term circulation perturbations. We also find that the best quantitative agreement with the observed mean age and ozone trends over the past three decades is found assuming a small strengthening of the mean circulation in the lower stratosphere, a moderate weakening of the mean circulation in the middle and upper stratosphere, and a moderate increase in the horizontal mixing into the tropics. The mean age trends are strongly sensitive to trends in the horizontal mixing into the tropics, and the uncertainty in the mixing trends causes uncertainty in the mean circulation trends. Comparisons of the mean circulation and mixing changes suggested by the measurements with those from a recent suite of CCM runs reveal significant differences that may have important implications on the accurate simulation of future stratospheric climate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Advanced Along-Track Scanning Radiometer (AATSR) was launched on Envisat in March 2002. The AATSR instrument is designed to retrieve precise and accurate global sea surface temperature (SST) that, combined with the large data set collected from its predecessors, ATSR and ATSR-2, will provide a long term record of SST data that is greater than 15 years. This record can be used for independent monitoring and detection of climate change. The AATSR validation programme has successfully completed its initial phase. The programme involves validation of the AATSR derived SST values using in situ radiometers, in situ buoys and global SST fields from other data sets. The results of the initial programme presented here will demonstrate that the AATSR instrument is currently close to meeting its scientific objectives of determining global SST to an accuracy of 0.3 K (one sigma). For night time data, the analysis gives a warm bias of between +0.04 K (0.28 K) for buoys to +0.06 K (0.20 K) for radiometers, with slightly higher errors observed for day time data, showing warm biases of between +0.02 (0.39 K) for buoys to +0.11 K (0.33 K) for radiometers. They show that the ATSR series of instruments continues to be the world leader in delivering accurate space-based observations of SST, which is a key climate parameter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A method is proposed for merging different nadir-sounding climate data records using measurements from high-resolution limb sounders to provide a transfer function between the different nadir measurements. The two nadir-sounding records need not be overlapping so long as the limb-sounding record bridges between them. The method is applied to global-mean stratospheric temperatures from the NOAA Climate Data Records based on the Stratospheric Sounding Unit (SSU) and the Advanced Microwave Sounding Unit-A (AMSU), extending the SSU record forward in time to yield a continuous data set from 1979 to present, and providing a simple framework for extending the SSU record into the future using AMSU. SSU and AMSU are bridged using temperature measurements from the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), which is of high enough vertical resolution to accurately represent the weighting functions of both SSU and AMSU. For this application, a purely statistical approach is not viable since the different nadir channels are not sufficiently linearly independent, statistically speaking. The near-global-mean linear temperature trends for extended SSU for 1980–2012 are −0.63 ± 0.13, −0.71 ± 0.15 and −0.80 ± 0.17 K decade−1 (95 % confidence) for channels 1, 2 and 3, respectively. The extended SSU temperature changes are in good agreement with those from the Microwave Limb Sounder (MLS) on the Aura satellite, with both exhibiting a cooling trend of ~ 0.6 ± 0.3 K decade−1 in the upper stratosphere from 2004 to 2012. The extended SSU record is found to be in agreement with high-top coupled atmosphere–ocean models over the 1980–2012 period, including the continued cooling over the first decade of the 21st century.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The stratospheric mean-meridional circulation (MMC) and eddy mixing are compared among six meteorological reanalysis data sets: NCEP-NCAR, NCEP-CFSR, ERA-40, ERA-Interim, JRA-25, and JRA-55 for the period 1979–2012. The reanalysis data sets produced using advanced systems (i.e., NCEP-CFSR, ERA-Interim, and JRA-55) generally reveal a weaker MMC in the Northern Hemisphere (NH) compared with those produced using older systems (i.e., NCEP/NCAR, ERA-40, and JRA-25). The mean mixing strength differs largely among the data products. In the NH lower stratosphere, the contribution of planetary-scale mixing is larger in the new data sets than in the old data sets, whereas that of small-scale mixing is weaker in the new data sets. Conventional data assimilation techniques introduce analysis increments without maintaining physical balance, which may have caused an overly strong MMC and spurious small-scale eddies in the old data sets. At the NH mid-latitudes, only ERA-Interim reveals a weakening MMC trend in the deep branch of the Brewer–Dobson circulation (BDC). The relative importance of the eddy mixing compared with the mean-meridional transport in the subtropical lower stratosphere shows increasing trends in ERA-Interim and JRA-55; this together with the weakened MMC in the deep branch may imply an increasing age-of-air (AoA) in the NH middle stratosphere in ERA-Interim. Overall, discrepancies between the different variables and trends therein as derived from the different reanalyses are still relatively large, suggesting that more investments in these products are needed in order to obtain a consolidated picture of observed changes in the BDC and the mechanisms that drive them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Empirical phylogeographic studies have progressively sampled greater numbers of loci over time, in part motivated by theoretical papers showing that estimates of key demographic parameters improve as the number of loci increases. Recently, next-generation sequencing has been applied to questions about organismal history, with the promise of revolutionizing the field. However, no systematic assessment of how phylogeographic data sets have changed over time with respect to overall size and information content has been performed. Here, we quantify the changing nature of these genetic data sets over the past 20years, focusing on papers published in Molecular Ecology. We found that the number of independent loci, the total number of alleles sampled and the total number of single nucleotide polymorphisms (SNPs) per data set has improved over time, with particularly dramatic increases within the past 5years. Interestingly, uniparentally inherited organellar markers (e.g. animal mitochondrial and plant chloroplast DNA) continue to represent an important component of phylogeographic data. Single-species studies (cf. comparative studies) that focus on vertebrates (particularly fish and to some extent, birds) represent the gold standard of phylogeographic data collection. Based on the current trajectory seen in our survey data, forecast modelling indicates that the median number of SNPs per data set for studies published by the end of the year 2016 may approach similar to 20000. This survey provides baseline information for understanding the evolution of phylogeographic data sets and underscores the fact that development of analytical methods for handling very large genetic data sets will be critical for facilitating growth of the field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.