65 resultados para data sets

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider an inverse elasticity problem in which forces and displacements are known on the boundary and the material property distribution inside the body is to be found. In other words, we need to estimate the distribution of constitutive properties using the finite boundary data sets. Uniqueness of the solution to this problem is proved in the literature only under certain assumptions for a given complete Dirichlet-to-Neumann map. Another complication in the numerical solution of this problem is that the number of boundary data sets needed to establish uniqueness is not known even under the restricted cases where uniqueness is proved theoretically. In this paper, we present a numerical technique that can assess the sufficiency of given boundary data sets by computing the rank of a sensitivity matrix that arises in the Gauss-Newton method used to solve the problem. Numerical experiments are presented to illustrate the method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we applied the integration methodology developed in the companion paper by Aires (2014) by using real satellite observations over the Mississippi Basin. The methodology provides basin-scale estimates of the four water budget components (precipitation P, evapotranspiration E, water storage change Delta S, and runoff R) in a two-step process: the Simple Weighting (SW) integration and a Postprocessing Filtering (PF) that imposes the water budget closure. A comparison with in situ observations of P and E demonstrated that PF improved the estimation of both components. A Closure Correction Model (CCM) has been derived from the integrated product (SW+PF) that allows to correct each observation data set independently, unlike the SW+PF method which requires simultaneous estimates of the four components. The CCM allows to standardize the various data sets for each component and highly decrease the budget residual (P - E - Delta S - R). As a direct application, the CCM was combined with the water budget equation to reconstruct missing values in any component. Results of a Monte Carlo experiment with synthetic gaps demonstrated the good performances of the method, except for the runoff data that has a variability of the same order of magnitude as the budget residual. Similarly, we proposed a reconstruction of Delta S between 1990 and 2002 where no Gravity Recovery and Climate Experiment data are available. Unlike most of the studies dealing with the water budget closure at the basin scale, only satellite observations and in situ runoff measurements are used. Consequently, the integrated data sets are model independent and can be used for model calibration or validation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

It has long been thought that tropical rainfall retrievals from satellites have large errors. Here we show, using a new daily 1 degree gridded rainfall data set based on about 1800 gauges from the India Meteorology Department (IMD), that modern satellite estimates are reasonably close to observed rainfall over the Indian monsoon region. Daily satellite rainfalls from the Global Precipitation Climatology Project (GPCP 1DD) and the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) are available since 1998. The high summer monsoon (June-September) rain over the Western Ghats and Himalayan foothills is captured in TMPA data. Away from hilly regions, the seasonal mean and intraseasonal variability of rainfall (averaged over regions of a few hundred kilometers linear dimension) from both satellite products are about 15% of observations. Satellite data generally underestimate both the mean and variability of rain, but the phase of intraseasonal variations is accurate. On synoptic timescales, TMPA gives reasonable depiction of the pattern and intensity of torrential rain from individual monsoon low-pressure systems and depressions. A pronounced biennial oscillation of seasonal total central India rain is seen in all three data sets, with GPCP 1DD being closest to IMD observations. The new satellite data are a promising resource for the study of tropical rainfall variability.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We revise and extend the extreme value statistic, introduced in Gupta et al., to study direction dependence in the high-redshift supernova data, arising either from departures, from the cosmological principle or due to direction-dependent statistical systematics in the data. We introduce a likelihood function that analytically marginalizes over the,Hubble constant and use it to extend our previous statistic. We also introduce a new statistic that is sensitive to direction dependence arising from living off-centre inside a large void as well as from previously mentioned reasons for anisotropy. We show that for large data sets, this statistic has a limiting form that can be computed analytically. We apply our statistics to the gold data sets from Riess et al., as in our previous work. Our revision and extension of the previous statistic show that the effect of marginalizing over the Hubble constant instead of using its best-fitting value on our results is only marginal. However, correction of errors in our previous work reduces the level of non-Gaussianity in the 2004 gold data that were found in our earlier work. The revised results for the 2007 gold data show that the data are consistent with isotropy and Gaussianity. Our second statistic confirms these results.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Tower platforms, with instrumentation at six levels above the surface to a height of 30 m, were used to record various atmospheric parameters in the surface layer. Sensors for measuring both mean and fluctuating quantities were used, with the majority of them indigenously built. Soil temperature sensors up to a depth of 30 cm from the surface were among the variables connected to the mean data logger. A PC-based data acquisition system built at the Centre for Atmospheric Sciences, IISc, was used to acquire the data from fast response sensors. This paper reports the various components of a typical MONTBLEX tower observatory and describes the actual experiments carried out in the surface layer at four sites over the monsoon trough region as a part of the MONTBLEX programme. It also describes and discusses several checks made on randomly selected tower data-sets acquired during the experiment. Checks made include visual inspection of time traces from various sensors, comparative plots of sensors measuring the same variable, wind and temperature profile plots calculation of roughness lengths, statistical and stability parameters, diurnal variation of stability parameters, and plots of probability density and energy spectrum for the different sensors. Results from these checks are found to be very encouraging and reveal the potential for further detailed analysis to understand more about surface layer characteristics.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We introduce a multifield comparison measure for scalar fields that helps in studying relations between them. The comparison measure is insensitive to noise in the scalar fields and to noise in their gradients. Further, it can be computed robustly and efficiently. Results from the visual analysis of various data sets from climate science and combustion applications demonstrate the effective use of the measure.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In most taxa, species boundaries are inferred based on differences in morphology or DNA sequences revealed by taxonomic or phylogenetic analyses. In crickets, acoustic mating signals or calling songs have species-specific structures and provide a third data set to infer species boundaries. We examined the concordance in species boundaries obtained using acoustic, morphological, and molecular data sets in the field cricket genus Itaropsis. This genus is currently described by only one valid species, Itaropsis tenella, with a broad distribution in western peninsular India and Sri Lanka. Calling songs of males sampled from four sites in peninsular India exhibited significant differences in a number of call features, suggesting the existence of multiple species. Cluster analysis of the acoustic data, molecular phylogenetic analyses, and phylogenetic analyses combining all data sets suggested the existence of three clades. Whatever the differences in calling signals, no full congruence was obtained between all the data sets, even though the resultant lineages were largely concordant with the acoustic clusters. The genus Itaropsis could thus be represented by three morphologically cryptic incipient species in peninsular India; their distributions are congruent with usual patterns of endemism in the Western Ghats, India. Song evolution is analysed through the divergence in syllable period, syllable and call duration, and dominant frequency.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

[1] Evaporative fraction (EF) is a measure of the amount of available energy at the earth surface that is partitioned into latent heat flux. The currently operational thermal sensors like the Moderate Resolution Imaging Spectroradiometer (MODIS) on satellite platforms provide data only at 1000 m, which constraints the spatial resolution of EF estimates. A simple model (disaggregation of evaporative fraction (DEFrac)) based on the observed relationship between EF and the normalized difference vegetation index is proposed to spatially disaggregate EF. The DEFrac model was tested with EF estimated from the triangle method using 113 clear sky data sets from the MODIS sensor aboard Terra and Aqua satellites. Validation was done using the data at four micrometeorological tower sites across varied agro-climatic zones possessing different land cover conditions in India using Bowen ratio energy balance method. The root-mean-square error (RMSE) of EF estimated at 1000 m resolution using the triangle method was 0.09 for all the four sites put together. The RMSE of DEFrac disaggregated EF was 0.09 for 250 m resolution. Two models of input disaggregation were also tried with thermal data sharpened using two thermal sharpening models DisTrad and TsHARP. The RMSE of disaggregated EF was 0.14 for both the input disaggregation models for 250 m resolution. Moreover, spatial analysis of disaggregation was performed using Landsat-7 (Enhanced Thematic Mapper) ETM+ data over four grids in India for contrasted seasons. It was observed that the DEFrac model performed better than the input disaggregation models under cropped conditions while they were marginally similar under non-cropped conditions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper describes an algorithm for multi-label classification. Since a pattern can belong to more than one class, the task of classifying a test pattern is a challenging one. We propose a new algorithm to carry out multi-label classification which works for discrete data. We have implemented the algorithm and presented the results for different multi-label data sets. The results have been compared with the algorithm multi-label KNN or ML-KNN and found to give good results.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Simulated boundary potential data for Electrical Impedance Tomography (EIT) are generated by a MATLAB based EIT data generator and the resistivity reconstruction is evaluated with Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software (EIDORS). Circular domains containing subdomains as inhomogeneity are defined in MATLAB-based EIT data generator and the boundary data are calculated by a constant current simulation with opposite current injection (OCI) method. The resistivity images reconstructed for different boundary data sets and images are analyzed with image parameters to evaluate the reconstruction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better orcomparable generalization performance over existing methods.