65 resultados para data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The variability of the sea surface salinity (SSS) in the Indian Ocean is studied using a 100-year control simulation of the Community Climate System Model (CCSM 2.0). The monsoon-driven seasonal SSS pattern in the Indian Ocean, marked by low salinity in the east and high salinity in the west, is captured by the model. The model overestimates runoff int the Bay of Bengal due to higher rainfall over the Himalayan-Tibetan regions which drain into the Bay of Bengal through Ganga-Brahmaputra rivers. The outflow of low-salinity water from the Bay of Bengal is to strong in the model. Consequently, the model Indian Ocean SSS is about 1 less than that seen in the climatology. The seasonal Indian Ocean salt balance obtained from the model is consistent with the analysis from climatological data sets. During summer, the large freshwater input into the Bay of Bengal and its redistribution decide the spatial pattern of salinity tendency. During winter, horizontal advection is the dominant contributor to the tendency term. The interannual variability of the SSS in the Indian Ocean is about five times larger than that in coupled model simulations of the North Atlantic Ocean. Regions of large interannual standard deviations are located near river mouths in the Bay of Bengal and in the eastern equatorial Indian Ocean. Both freshwater input into the ocean and advection of this anomalous flux are responsible for the generation of these anomalies. The model simulates 20 significant Indian Ocean Dipole (IOD) events and during IOD years large salinity anomalies appear in the equatorial Indian Ocean. The anomalies exist as two zonal bands: negative salinity anomalies to the north of the equator and positive to the south. The SSS anomalies for the years in which IOD is not present and for ENSO years are much weaker than during IOD years. Significant interannual SSS anomalies appear in the Indian Ocean only during IOD years.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Crystal structure determination at room temperature [292 (2) K] of racemic 1,1'-binaphthalene-2,2'-diyl diethyl bis(carbonate), C26H22O6, showed that one of the terminal carbon-carbon bond lengths is very short [Csp(3)-Csp(3) = 1.327 (6) angstrom]. The reason for such a short bond length has been analysed by collecting data sets on the same crystal at 393, 150 and 90 K. The values of the corrected bond lengths clearly suggest that the shortening is mainly due to positional disorder at two sites, with minor perturbations arising as a result of thermal vibrations. The positional disorder has been resolved in the analysis of the 90 K data following the changes in the unit-cell parameters for the data sets at 150 and 90 K, which appear to be an artifact of a near centre of symmetry relationship between the two independent molecules in the space group P (1) over bar at these temperatures. Indeed, the unit cell at low temperature (150 and 90 K) is a supercell of the room-temperature unit cell.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Support Vector Machines(SVMs) are hyperplane classifiers defined in a kernel induced feature space. The data size dependent training time complexity of SVMs usually prohibits its use in applications involving more than a few thousands of data points. In this paper we propose a novel kernel based incremental data clustering approach and its use for scaling Non-linear Support Vector Machines to handle large data sets. The clustering method introduced can find cluster abstractions of the training data in a kernel induced feature space. These cluster abstractions are then used for selective sampling based training of Support Vector Machines to reduce the training time without compromising the generalization performance. Experiments done with real world datasets show that this approach gives good generalization performance at reasonable computational expense.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A popular dynamic imaging technique, k-t BLAST (ktB) is studied here for BAR imaging. ktB utilizes correlations in k-space and time, to reconstruct the image time series with only a fraction of the data. The algorithm works by unwrapping the aliased Fourier conjugate space of k-t (y-f-space). The unwrapping process utilizes the estimate of the true y-f-space, by acquiring densely sampled low k-space data. The drawbacks of this method include separate training scan, blurred training estimates and aliased phase maps. The proposed changes are incorporation of phase information from the training map and using generalized-series-extrapolated training map. The proposed technique is compared with ktB on real fMRI data. The proposed changes allow for ktB to operate at an acceleration factor of 6. Performance is evaluated by comparing activation maps obtained using reconstructed images. An improvement of up to 10 dB is observed in thePSNR of activation maps. Besides, a 10% reduction in RMSE is obtained over the entire time series of fMRI images. Peak improvement of the proposed method over ktB is 35%, averaged over five data sets. (C)2010 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A plethora of indices have been proposed and used to construct dominance hierarchies in a variety of vertebrate and invertebrate societies, although the rationale for choosing a particular index for a particular species is seldom explained. In this study, we analysed and compared three such indices, viz Clutton-Brock et al.'s index (CBI), originally developed for red deer, Cervus elaphus, David's score (DS) originally proposed by the statistician H. A. David and the frequency-based index of dominance (FDI) developed and routinely used by our group for the primitively eusocial wasps Ropalidia marginata and Ropalidia cyathiformis. Dominance ranks attributed by all three indices were strongly and positively correlated for both natural data sets from the wasp colonies and for artificial data sets generated for the purpose. However, the indices differed in their ability to yield unique (untied) ranks in the natural data sets. This appears to be caused by the presence of noninteracting individuals and reversals in the direction of dominance in some of the pairs in the natural data sets. This was confirmed by creating additional artificial data sets with noninteracting individuals and with reversals. Based on the criterion of yielding the largest proportion of unique ranks, we found that FDI is best suited for societies such as the wasps belonging to Ropalidia, DS is best suited for societies with reversals and CBI remains a suitable index for societies such as red deer in which multiple interactions are uncommon. (C) 2009 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper aims at evaluating the methods of multiclass support vector machines (SVMs) for effective use in distance relay coordination. Also, it describes a strategy of supportive systems to aid the conventional protection philosophy in combating situations where protection systems have maloperated and/or information is missing and provide selective and secure coordinations. SVMs have considerable potential as zone classifiers of distance relay coordination. This typically requires a multiclass SVM classifier to effectively analyze/build the underlying concept between reach of different zones and the apparent impedance trajectory during fault. Several methods have been proposed for multiclass classification where typically several binary SVM classifiers are combined together. Some authors have extended binary SVM classification to one-step single optimization operation considering all classes at once. In this paper, one-step multiclass classification, one-against-all, and one-against-one multiclass methods are compared for their performance with respect to accuracy, number of iterations, number of support vectors, training, and testing time. The performance analysis of these three methods is presented on three data sets belonging to training and testing patterns of three supportive systems for a region and part of a network, which is an equivalent 526-bus system of the practical Indian Western grid.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Core Vector Machine(CVM) is suitable for efficient large-scale pattern classification. In this paper, a method for improving the performance of CVM with Gaussian kernel function irrespective of the orderings of patterns belonging to different classes within the data set is proposed. This method employs a selective sampling based training of CVM using a novel kernel based scalable hierarchical clustering algorithm. Empirical studies made on synthetic and real world data sets show that the proposed strategy performs well on large data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. Both approaches guarantee that the radii of the spheres are properly ordered at the optimal solution. The size of the optimization problem is linear in the number of training samples. The popular SMO algorithm is adapted to solve the resulting optimization problem. Numerical experiments on some real-world data sets verify the usefulness of our approaches for data mining.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Deterministic models have been widely used to predict water quality in distribution systems, but their calibration requires extensive and accurate data sets for numerous parameters. In this study, alternative data-driven modeling approaches based on artificial neural networks (ANNs) were used to predict temporal variations of two important characteristics of water quality chlorine residual and biomass concentrations. The authors considered three types of ANN algorithms. Of these, the Levenberg-Marquardt algorithm provided the best results in predicting residual chlorine and biomass with error-free and ``noisy'' data. The ANN models developed here can generate water quality scenarios of piped systems in real time to help utilities determine weak points of low chlorine residual and high biomass concentration and select optimum remedial strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An algorithm to generate a minimal spanning tree is presented when the nodes with their coordinates in some m-dimensional Euclidean space and the corresponding metric are given. This algorithm is tested on manually generated data sets. The worst case time complexity of this algorithm is O(n log2n) for a collection of n data samples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The K-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic algorithm to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the K-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.