7 resultados para data integration
em Indian Institute of Science - Bangalore - Índia
Resumo:
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
Resumo:
The spatial error structure of daily precipitation derived from the latest version 7 (v7) tropical rainfall measuring mission (TRMM) level 2 data products are studied through comparison with the Asian precipitation highly resolved observational data integration toward evaluation of the water resources (APHRODITE) data over a subtropical region of the Indian subcontinent for the seasonal rainfall over 6 years from June 2002 to September 2007. The data products examined include v7 data from the TRMM radiometer Microwave Imager (TMI) and radar precipitation radar (PR), namely, 2A12, 2A25, and 2B31 (combined data from PR and TMI). The spatial distribution of uncertainty from these data products were quantified based on performance metrics derived from the contingency table. For the seasonal daily precipitation over a subtropical basin in India, the data product of 2A12 showed greater skill in detecting and quantifying the volume of rainfall when compared with the 2A25 and 2B31 data products. Error characterization using various error models revealed that random errors from multiplicative error models were homoscedastic and that they better represented rainfall estimates from 2A12 algorithm. Error decomposition techniques performed to disentangle systematic and random errors verify that the multiplicative error model representing rainfall from 2A12 algorithm successfully estimated a greater percentage of systematic error than 2A25 or 2B31 algorithms. Results verify that although the radiometer derived 2A12 rainfall data is known to suffer from many sources of uncertainties, spatial analysis over the case study region of India testifies that the 2A12 rainfall estimates are in a very good agreement with the reference estimates for the data period considered.
Resumo:
This paper is concerned with the integration of voice and data on an experimental local area network used by the School of Automation, of the Indian Institute of Science. SALAN (School of Automation Local Area Network) consists of a number of microprocessor-based communication nodes linked to a shared coaxial cable transmission medium. The communication nodes handle the various low-level functions associated with computer communication, and interface user data equipment to the network. SALAN at present provides a file transfer facility between an Intel Series III microcomputer development system and a Texas Instruments Model 990/4 microcomputer system. Further, a packet voice communication system has also been implemented on SALAN. The various aspects of the design and implementation of the above two utilities are discussed.
Resumo:
This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.
Resumo:
Low power consumption per channel and data rate minimization are two key challenges which need to be addressed in future generations of neural recording systems (NRS). Power consumption can be reduced by avoiding unnecessary processing whereas data rate is greatly decreased by sending spike time-stamps along with spike features as opposed to raw digitized data. Dynamic range in NRS can vary with time due to change in electrode-neuron distance or background noise, which demands adaptability. An analog-to-digital converter (ADC) is one of the most important blocks in a NRS. This paper presents an 8-bit SAR ADC in 0.13-mu m CMOS technology along with input and reference buffer. A novel energy efficient digital-to-analog converter switching scheme is proposed, which consumes 37% less energy than the present state-of-the-art. The use of a ping-pong input sampling scheme is emphasized for multichannel input to alleviate the bandwidth requirement of the input buffer. To reduce the data rate, the A/D process is only enabled through the in-built background noise rejection logic to ensure that the noise is not processed. The ADC resolution can be adjusted from 8 to 1 bit in 1-bit step based on the input dynamic range. The ADC consumes 8.8 mu W from 1 V supply at 1 MS/s speed. It achieves effective number of bits of 7.7 bits and FoM of 42.3 fJ/conversion-step.
Resumo:
The performance of postdetection integration (PDI) techniques for the detection of Global Navigation Satellite Systems (GNSS) signals in the presence of uncertainties in frequency offsets, noise variance, and unknown data-bits is studied. It is shown that the conventional PDI techniques are generally not robust to uncertainty in the data-bits and/or the noise variance. Two new modified PDI techniques are proposed, and they are shown to be robust to these uncertainties. The receiver operating characteristics (ROC) and sample complexity performance of the PDI techniques in the presence of model uncertainties are analytically derived. It is shown that the proposed methods significantly outperform existing methods, and hence they could become increasingly important as the GNSS receivers attempt to push the envelope on the minimum signal-to-noise ratio (SNR) for reliable detection.
Resumo:
In this study, we applied the integration methodology developed in the companion paper by Aires (2014) by using real satellite observations over the Mississippi Basin. The methodology provides basin-scale estimates of the four water budget components (precipitation P, evapotranspiration E, water storage change Delta S, and runoff R) in a two-step process: the Simple Weighting (SW) integration and a Postprocessing Filtering (PF) that imposes the water budget closure. A comparison with in situ observations of P and E demonstrated that PF improved the estimation of both components. A Closure Correction Model (CCM) has been derived from the integrated product (SW+PF) that allows to correct each observation data set independently, unlike the SW+PF method which requires simultaneous estimates of the four components. The CCM allows to standardize the various data sets for each component and highly decrease the budget residual (P - E - Delta S - R). As a direct application, the CCM was combined with the water budget equation to reconstruct missing values in any component. Results of a Monte Carlo experiment with synthetic gaps demonstrated the good performances of the method, except for the runoff data that has a variability of the same order of magnitude as the budget residual. Similarly, we proposed a reconstruction of Delta S between 1990 and 2002 where no Gravity Recovery and Climate Experiment data are available. Unlike most of the studies dealing with the water budget closure at the basin scale, only satellite observations and in situ runoff measurements are used. Consequently, the integrated data sets are model independent and can be used for model calibration or validation.