89 resultados para datasets

em Indian Institute of Science - Bangalore - Índia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Daily rainfall datasets of 10 years (1998-2007) of Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) version 6 and India Meteorological Department (IMD) gridded rain gauge have been compared over the Indian landmass, both in large and small spatial scales. On the larger spatial scale, the pattern correlation between the two datasets on daily scales during individual years of the study period is ranging from 0.4 to 0.7. The correlation improved significantly (similar to 0.9) when the study was confined to specific wet and dry spells each of about 5-8 days. Wavelet analysis of intraseasonal oscillations (ISO) of the southwest monsoon rainfall show the percentage contribution of the major two modes (30-50 days and 10-20 days), to be ranging respectively between similar to 30-40% and 5-10% for the various years. Analysis of inter-annual variability shows the satellite data to be underestimating seasonal rainfall by similar to 110 mm during southwest monsoon and overestimating by similar to 150 mm during northeast monsoon season. At high spatio-temporal scales, viz., 1 degrees x1 degrees grid, TMPA data do not correspond to ground truth. We have proposed here a new analysis procedure to assess the minimum spatial scale at which the two datasets are compatible with each other. This has been done by studying the contribution to total seasonal rainfall from different rainfall rate windows (at 1 mm intervals) on different spatial scales (at daily time scale). The compatibility spatial scale is seen to be beyond 5 degrees x5 degrees average spatial scale over the Indian landmass. This will help to decide the usability of TMPA products, if averaged at appropriate spatial scales, for specific process studies, e.g., cloud scale, meso scale or synoptic scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Neural data are inevitably contaminated by noise. When such noisy data are subjected to statistical analysis, misleading conclusions can be reached. Here we attempt to address this problem by applying a state-space smoothing method, based on the combined use of the Kalman filter theory and the Expectation–Maximization algorithm, to denoise two datasets of local field potentials recorded from monkeys performing a visuomotor task. For the first dataset, it was found that the analysis of the high gamma band (60–90 Hz) neural activity in the prefrontal cortex is highly susceptible to the effect of noise, and denoising leads to markedly improved results that were physiologically interpretable. For the second dataset, Granger causality between primary motor and primary somatosensory cortices was not consistent across two monkeys and the effect of noise was suspected. After denoising, the discrepancy between the two subjects was significantly reduced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A global climate model experiment is performed to evaluate the effect of irrigation on temperatures in several major irrigated regions of the world. The Community Atmosphere Model, version 3.3, was modified to represent irrigation for the fraction of each grid cell equipped for irrigation according to datasets from the Food and Agriculture Organization. Results indicate substantial regional differences in the magnitude of irrigation-induced cooling, which are attributed to three primary factors: differences in extent of the irrigated area, differences in the simulated soil moisture for the control simulation (without irrigation), and the nature of cloud response to irrigation. The last factor appeared especially important for the dry season in India, although further analysis with other models and observations are needed to verify this feedback. Comparison with observed temperatures revealed substantially lower biases in several regions for the simulation with irrigation than for the control, suggesting that the lack of irrigation may be an important component of temperature bias in this model or that irrigation compensates for other biases. The results of this study should help to translate the results from past regional efforts, which have largely focused on the United States, to regions in the developing world that in many cases continue to experience significant expansion of irrigated land.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A nucleosome forms a basic unit of the chromosome structure. A biologically relevant question is how much of the nucleosomal conformational space is accessible to protein-free DNA, and what proportion of the nucleosomal conformations are induced by bound histones. To investigate this, we have analysed high resolution xray crystal structure datasets of DNA in protein-free as well as protein-bound forms, and compared the dinucleotide step parameters for the two datasets with those for high resolution nucleosome structures. Our analysis shows that most of the dinucleotide step parameter values for the nucleosome structures lie within the range accessible to protein-free DNA, indirectly indicating that the histone core plays more of a stabilizing role. The nucleosome structures are observed to assume smooth and nearly planar curvature, implying that ‘normal’ B-DNA like parameters can give rise to a curved geometry at the gross structural level. Different nucleosome

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a Chance-constraint Programming approach for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in training examples. The methodology ensures that uncertain examples are classified correctly with high probability by employing chance-constraints. The main contribution of the paper is to pose the resultant optimization problem as a Second Order Cone Program by using large deviation inequalities, due to Bernstein. Apart from support and mean of the uncertain examples these Bernstein based relaxations make no further assumptions on the underlying uncertainty. Classifiers built using the proposed approach are less conservative, yield higher margins and hence are expected to generalize better than existing methods. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle interval-valued uncertainty than state-of-the-art.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper suggests a scheme for classifying online handwritten characters, based on dynamic space warping of strokes within the characters. A method for segmenting components into strokes using velocity profiles is proposed. Each stroke is a simple arbitrary shape and is encoded using three attributes. Correspondence between various strokes is established using Dynamic Space Warping. A distance measure which reliably differentiates between two corresponding simple shapes (strokes) has been formulated thus obtaining a perceptual distance measure between any two characters. Tests indicate an accuracy of over 85% on two different datasets of characters.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The application of computer-aided inspection integrated with the coordinate measuring machine and laser scanners to inspect manufactured aircraft parts using robust registration of two-point datasets is a subject of active research in computational metrology. This paper presents a novel approach to automated inspection by matching shapes based on the modified iterative closest point (ICP) method to define a criterion for the acceptance or rejection of a part. This procedure improves upon existing methods by doing away with the following, viz., the need for constructing either a tessellated or smooth representation of the inspected part and requirements for an a priori knowledge of approximate registration and correspondence between the points representing the computer-aided design datasets and the part to be inspected. In addition, this procedure establishes a better measure for error between the two matched datasets. The use of localized region-based triangulation is proposed for tracking the error. The approach described improves the convergence of the ICP technique with a dramatic decrease in computational effort. Experimental results obtained by implementing this proposed approach using both synthetic and practical data show that the present method is efficient and robust. This method thereby validates the algorithm, and the examples demonstrate its potential to be used in engineering applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Support Vector Machines(SVMs) are hyperplane classifiers defined in a kernel induced feature space. The data size dependent training time complexity of SVMs usually prohibits its use in applications involving more than a few thousands of data points. In this paper we propose a novel kernel based incremental data clustering approach and its use for scaling Non-linear Support Vector Machines to handle large data sets. The clustering method introduced can find cluster abstractions of the training data in a kernel induced feature space. These cluster abstractions are then used for selective sampling based training of Support Vector Machines to reduce the training time without compromising the generalization performance. Experiments done with real world datasets show that this approach gives good generalization performance at reasonable computational expense.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study uses the European Centre for Medium-Range Weather Forecasts (ECMWF) model-generated high-resolution 10-day-long predictions for the Year of Tropical Convection (YOTC) 2008. Precipitation forecast skills of the model over the tropics are evaluated against the Tropical Rainfall Measuring Mission (TRMM) estimates. It has been shown that the model was able to capture the monthly to seasonal mean features of tropical convection reasonably. Northward propagation of convective bands over the Bay of Bengal was also forecasted realistically up to 5 days in advance, including the onset phase of the monsoon during the first half of June 2008. However, large errors exist in the daily datasets especially for longer lead times over smaller domains. For shorter lead times (less than 4-5 days), forecast errors are much smaller over the oceans than over land. Moreover, the rate of increase of errors with lead time is rapid over the oceans and is confined to the regions where observed precipitation shows large day-to-day variability. It has been shown that this rapid growth of errors over the oceans is related to the spatial pattern of near-surface air temperature. This is probably due to the one-way air-sea interaction in the atmosphere-only model used for forecasting. While the prescribed surface temperature over the oceans remain realistic at shorter lead times, the pattern and hence the gradient of the surface temperature is not altered with change in atmospheric parameters at longer lead times. It has also been shown that the ECMWF model had considerable difficulties in forecasting very low and very heavy intensity of precipitation over South Asia. The model has too few grids with ``zero'' precipitation and heavy (>40 mm day(-1)) precipitation. On the other hand, drizzle-like precipitation is too frequent in the model compared to that in the TRMM datasets. Further analysis shows that a major source of error in the ECMWF precipitation forecasts is the diurnal cycle over the South Asian monsoon region. The peak intensity of precipitation in the model forecasts over land (ocean) appear about 6 (9) h earlier than that in the observations. Moreover, the amplitude of the diurnal cycle is much higher in the model forecasts compared to that in the TRMM estimates. It has been seen that the phase error of the diurnal cycle increases with forecast lead time. The error in monthly mean 3-hourly precipitation forecasts is about 2-4 times of the error in the daily mean datasets. Thus, effort should be given to improve the phase and amplitude forecast of the diurnal cycle of precipitation from the model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have analyzed the set of inter and intra base pair parameters for each dinucleotide step in single crystal structures of dodecamers, solved at high and medium resolution and all crystallized in P2(1)2(1)2(1) space group. The objective was to identify whether all the structures which have either the Drew-Dickerson (DD) sequence d[CGCGAATTCGCG] with some base modification or related sequence (non-DD), would display the same sequence dependent structural variability about its palindromic sequence, despite the molecule being bent at one end because of similar crystal lattice packing effect. Most of the local doublet parameters for base pairs steps G2-C3 and G10-C11 positions, symmetrically situated about the lateral twofold, were significantly correlated between themselves. In non-DD sequences, significant correlations between these positional parameters were absent. The different range of local step parameter values at each sequence position contributed to the gross feature of smooth helix axis bending in all structures. The base pair parameters in some of the positions, for medium resolution DD sequence, were quite unlike the high-resolution set and encompassed a higher range of values. Twist and slide are the two main parameters that show wider conformational range for the middle region of non-DD sequence structures in comparison to DD sequence structures. On the contrary, the minor and major groove features bear good resemblance between DD and non-DD sequence crystal structure datasets. The sugar-phosphate backbone torsion angles are similar in all structures, in sharp contrast to base pair parameter variation for high and low resolution DD and non-DD sequence structures, consisting of unusual (epsilon =g(-), xi =t) B-II conformation at the 10(th) position of the dodecamer sequence. Thus examining DD and non-DD sequence structures packed in the same crystal lattice arrangement, we infer that inter and intra base pair parameters are as symmetrically equivalent in its value as the symmetry related step for the palindromic DD sequence about lateral two-fold axis. This feature would lead us to agree with the conclusion that DNA conformation is not substantially affected by end-to-end or lateral inter-molecular interaction due to crystal lattice packing effect. Non-DD sequence structures acquire step parameter values which reflect the altered sequence at each of the dodecamer sequence position in the orthorhombic lattice while showing similar gross features of DD sequence structures