958 resultados para Data sets storage


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sub-seasonal variability including equatorial waves significantly influence the dehydration and transport processes in the tropical tropopause layer (TTL). This study investigates the wave activity in the TTL in 7 reanalysis data sets (RAs; NCEP1, NCEP2, ERA40, ERA-Interim, JRA25, MERRA, and CFSR) and 4 chemistry climate models (CCMs; CCSRNIES, CMAM, MRI, and WACCM) using the zonal wave number-frequency spectral analysis method with equatorially symmetric-antisymmetric decomposition. Analyses are made for temperature and horizontal winds at 100 hPa in the RAs and CCMs and for outgoing longwave radiation (OLR), which is a proxy for convective activity that generates tropopause-level disturbances, in satellite data and the CCMs. Particular focus is placed on equatorial Kelvin waves, mixed Rossby-gravity (MRG) waves, and the Madden-Julian Oscillation (MJO). The wave activity is defined as the variance, i.e., the power spectral density integrated in a particular zonal wave number-frequency region. It is found that the TTL wave activities show significant difference among the RAs, ranging from ∼0.7 (for NCEP1 and NCEP2) to ∼1.4 (for ERA-Interim, MERRA, and CFSR) with respect to the averages from the RAs. The TTL activities in the CCMs lie generally within the range of those in the RAs, with a few exceptions. However, the spectral features in OLR for all the CCMs are very different from those in the observations, and the OLR wave activities are too low for CCSRNIES, CMAM, and MRI. It is concluded that the broad range of wave activity found in the different RAs decreases our confidence in their validity and in particular their value for validation of CCM performance in the TTL, thereby limiting our quantitative understanding of the dehydration and transport processes in the TTL.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variability in the strength of the stratospheric Lagrangian mean meridional or Brewer-Dobson circulation and horizontal mixing into the tropics over the past three decades are examined using observations of stratospheric mean age of air and ozone. We use a simple representation of the stratosphere, the tropical leaky pipe (TLP) model, guided by mean meridional circulation and horizontal mixing changes in several reanalyses data sets and chemistry climate model (CCM) simulations, to help elucidate reasons for the observed changes in stratospheric mean age and ozone. We find that the TLP model is able to accurately simulate multiyear variability in ozone following recent major volcanic eruptions and the early 2000s sea surface temperature changes, as well as the lasting impact on mean age of relatively short-term circulation perturbations. We also find that the best quantitative agreement with the observed mean age and ozone trends over the past three decades is found assuming a small strengthening of the mean circulation in the lower stratosphere, a moderate weakening of the mean circulation in the middle and upper stratosphere, and a moderate increase in the horizontal mixing into the tropics. The mean age trends are strongly sensitive to trends in the horizontal mixing into the tropics, and the uncertainty in the mixing trends causes uncertainty in the mean circulation trends. Comparisons of the mean circulation and mixing changes suggested by the measurements with those from a recent suite of CCM runs reveal significant differences that may have important implications on the accurate simulation of future stratospheric climate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Advanced Along-Track Scanning Radiometer (AATSR) was launched on Envisat in March 2002. The AATSR instrument is designed to retrieve precise and accurate global sea surface temperature (SST) that, combined with the large data set collected from its predecessors, ATSR and ATSR-2, will provide a long term record of SST data that is greater than 15 years. This record can be used for independent monitoring and detection of climate change. The AATSR validation programme has successfully completed its initial phase. The programme involves validation of the AATSR derived SST values using in situ radiometers, in situ buoys and global SST fields from other data sets. The results of the initial programme presented here will demonstrate that the AATSR instrument is currently close to meeting its scientific objectives of determining global SST to an accuracy of 0.3 K (one sigma). For night time data, the analysis gives a warm bias of between +0.04 K (0.28 K) for buoys to +0.06 K (0.20 K) for radiometers, with slightly higher errors observed for day time data, showing warm biases of between +0.02 (0.39 K) for buoys to +0.11 K (0.33 K) for radiometers. They show that the ATSR series of instruments continues to be the world leader in delivering accurate space-based observations of SST, which is a key climate parameter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A method is proposed for merging different nadir-sounding climate data records using measurements from high-resolution limb sounders to provide a transfer function between the different nadir measurements. The two nadir-sounding records need not be overlapping so long as the limb-sounding record bridges between them. The method is applied to global-mean stratospheric temperatures from the NOAA Climate Data Records based on the Stratospheric Sounding Unit (SSU) and the Advanced Microwave Sounding Unit-A (AMSU), extending the SSU record forward in time to yield a continuous data set from 1979 to present, and providing a simple framework for extending the SSU record into the future using AMSU. SSU and AMSU are bridged using temperature measurements from the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), which is of high enough vertical resolution to accurately represent the weighting functions of both SSU and AMSU. For this application, a purely statistical approach is not viable since the different nadir channels are not sufficiently linearly independent, statistically speaking. The near-global-mean linear temperature trends for extended SSU for 1980–2012 are −0.63 ± 0.13, −0.71 ± 0.15 and −0.80 ± 0.17 K decade−1 (95 % confidence) for channels 1, 2 and 3, respectively. The extended SSU temperature changes are in good agreement with those from the Microwave Limb Sounder (MLS) on the Aura satellite, with both exhibiting a cooling trend of ~ 0.6 ± 0.3 K decade−1 in the upper stratosphere from 2004 to 2012. The extended SSU record is found to be in agreement with high-top coupled atmosphere–ocean models over the 1980–2012 period, including the continued cooling over the first decade of the 21st century.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The stratospheric mean-meridional circulation (MMC) and eddy mixing are compared among six meteorological reanalysis data sets: NCEP-NCAR, NCEP-CFSR, ERA-40, ERA-Interim, JRA-25, and JRA-55 for the period 1979–2012. The reanalysis data sets produced using advanced systems (i.e., NCEP-CFSR, ERA-Interim, and JRA-55) generally reveal a weaker MMC in the Northern Hemisphere (NH) compared with those produced using older systems (i.e., NCEP/NCAR, ERA-40, and JRA-25). The mean mixing strength differs largely among the data products. In the NH lower stratosphere, the contribution of planetary-scale mixing is larger in the new data sets than in the old data sets, whereas that of small-scale mixing is weaker in the new data sets. Conventional data assimilation techniques introduce analysis increments without maintaining physical balance, which may have caused an overly strong MMC and spurious small-scale eddies in the old data sets. At the NH mid-latitudes, only ERA-Interim reveals a weakening MMC trend in the deep branch of the Brewer–Dobson circulation (BDC). The relative importance of the eddy mixing compared with the mean-meridional transport in the subtropical lower stratosphere shows increasing trends in ERA-Interim and JRA-55; this together with the weakened MMC in the deep branch may imply an increasing age-of-air (AoA) in the NH middle stratosphere in ERA-Interim. Overall, discrepancies between the different variables and trends therein as derived from the different reanalyses are still relatively large, suggesting that more investments in these products are needed in order to obtain a consolidated picture of observed changes in the BDC and the mechanisms that drive them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelisation have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm — the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy consumption data are required to perform analysis, modelling, evaluation, and optimisation of energy usage in buildings. While a variety of energy consumption data sets have been examined and reported in the literature, there is a lack of a comprehensive categorisation and analysis of the available data sets. In this study, an overview of energy consumption data of buildings is provided. Three common strategies for generating energy consumption data, i.e., measurement, survey, and simulation, are described. A number of important characteristics pertaining to each strategy and the resulting data sets are discussed. In addition, a directory of energy consumption data sets of buildings is developed. The data sets are collected from either published papers or energy related organisations. The main contributions of this study include establishing a resource pertaining to energy consumption data sets and providing information related to the characteristics and availability of the respective data sets; therefore facilitating and promoting research activities in energy consumption data analysis.