20 resultados para Data uncertainty
em Indian Institute of Science - Bangalore - Índia
Resumo:
This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases.
Resumo:
This paper presents a Chance-constraint Programming approach for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in training examples. The methodology ensures that uncertain examples are classified correctly with high probability by employing chance-constraints. The main contribution of the paper is to pose the resultant optimization problem as a Second Order Cone Program by using large deviation inequalities, due to Bernstein. Apart from support and mean of the uncertain examples these Bernstein based relaxations make no further assumptions on the underlying uncertainty. Classifiers built using the proposed approach are less conservative, yield higher margins and hence are expected to generalize better than existing methods. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle interval-valued uncertainty than state-of-the-art.
Resumo:
Uncertainties associated with the structural model and measured vibration data may lead to unreliable damage detection. In this paper, we show that geometric and measurement uncertainty cause considerable problem in damage assessment which can be alleviated by using a fuzzy logic-based approach for damage detection. Curvature damage factor (CDF) of a tapered cantilever beam are used as damage indicators. Monte Carlo simulation (MCS) is used to study the changes in the damage indicator due to uncertainty in the geometric properties of the beam. Variation in these CDF measures due to randomness in structural parameter, further contaminated with measurement noise, are used for developing and testing a fuzzy logic system (FLS). Results show that the method correctly identifies both single and multiple damages in the structure. For example, the FLS detects damage with an average accuracy of about 95 percent in a beam having geometric uncertainty of 1 percent COV and measurement noise of 10 percent in single damage scenario. For multiple damage case, the FLS identifies damages in the beam with an average accuracy of about 94 percent in the presence of above mentioned uncertainties. The paper brings together the disparate areas of probabilistic analysis and fuzzy logic to address uncertainty in structural damage detection.
Resumo:
The behaviour of laterally loaded piles is considerably influenced by the uncertainties in soil properties. Hence probabilistic models for assessment of allowable lateral load are necessary. Cone penetration test (CPT) data are often used to determine soil strength parameters, whereby the allowable lateral load of the pile is computed. In the present study, the maximum lateral displacement and moment of the pile are obtained based on the coefficient of subgrade reaction approach, considering the nonlinear soil behaviour in undrained clay. The coefficient of subgrade reaction is related to the undrained shear strength of soil, which can be obtained from CPT data. The soil medium is modelled as a one-dimensional random field along the depth, and it is described by the standard deviation and scale of fluctuation of the undrained shear strength of soil. Inherent soil variability, measurement uncertainty and transformation uncertainty are taken into consideration. The statistics of maximum lateral deflection and moment are obtained using the first-order, second-moment technique. Hasofer-Lind reliability indices for component and system failure criteria, based on the allowable lateral displacement and moment capacity of the pile section, are evaluated. The geotechnical database from the Konaseema site in India is used as a case example. It is shown that the reliability-based design approach for pile foundations, considering the spatial variability of soil, permits a rational choice of allowable lateral loads.
Resumo:
Hydrologic impacts of climate change are usually assessed by downscaling the General Circulation Model (GCM) output of large-scale climate variables to local-scale hydrologic variables. Such an assessment is characterized by uncertainty resulting from the ensembles of projections generated with multiple GCMs, which is known as intermodel or GCM uncertainty. Ensemble averaging with the assignment of weights to GCMs based on model evaluation is one of the methods to address such uncertainty and is used in the present study for regional-scale impact assessment. GCM outputs of large-scale climate variables are downscaled to subdivisional-scale monsoon rainfall. Weights are assigned to the GCMs on the basis of model performance and model convergence, which are evaluated with the Cumulative Distribution Functions (CDFs) generated from the downscaled GCM output (for both 20th Century [20C3M] and future scenarios) and observed data. Ensemble averaging approach, with the assignment of weights to GCMs, is characterized by the uncertainty caused by partial ignorance, which stems from nonavailability of the outputs of some of the GCMs for a few scenarios (in Intergovernmental Panel on Climate Change [IPCC] data distribution center for Assessment Report 4 [AR4]). This uncertainty is modeled with imprecise probability, i.e., the probability being represented as an interval gray number. Furthermore, the CDF generated with one GCM is entirely different from that with another and therefore the use of multiple GCMs results in a band of CDFs. Representing this band of CDFs with a single valued weighted mean CDF may be misleading. Such a band of CDFs can only be represented with an envelope that contains all the CDFs generated with a number of GCMs. Imprecise CDF represents such an envelope, which not only contains the CDFs generated with all the available GCMs but also to an extent accounts for the uncertainty resulting from the missing GCM output. This concept of imprecise probability is also validated in the present study. The imprecise CDFs of monsoon rainfall are derived for three 30-year time slices, 2020s, 2050s and 2080s, with A1B, A2 and B1 scenarios. The model is demonstrated with the prediction of monsoon rainfall in Orissa meteorological subdivision, which shows a possible decreasing trend in the future.
Resumo:
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
Resumo:
Often the soil hydraulic parameters are obtained by the inversion of measured data (e.g. soil moisture, pressure head, and cumulative infiltration, etc.). However, the inverse problem in unsaturated zone is ill-posed due to various reasons, and hence the parameters become non-unique. The presence of multiple soil layers brings the additional complexities in the inverse modelling. The generalized likelihood uncertainty estimate (GLUE) is a useful approach to estimate the parameters and their uncertainty when dealing with soil moisture dynamics which is a highly non-linear problem. Because the estimated parameters depend on the modelling scale, inverse modelling carried out on laboratory data and field data may provide independent estimates. The objective of this paper is to compare the parameters and their uncertainty estimated through experiments in the laboratory and in the field and to assess which of the soil hydraulic parameters are independent of the experiment. The first two layers in the field site are characterized by Loamy sand and Loamy. The mean soil moisture and pressure head at three depths are measured with an interval of half hour for a period of 1 week using the evaporation method for the laboratory experiment, whereas soil moisture at three different depths (60, 110, and 200 cm) is measured with an interval of 1 h for 2 years for the field experiment. A one-dimensional soil moisture model on the basis of the finite difference method was used. The calibration and validation are approximately for 1 year each. The model performance was found to be good with root mean square error (RMSE) varying from 2 to 4 cm(3) cm(-3). It is found from the two experiments that mean and uncertainty in the saturated soil moisture (theta(s)) and shape parameter (n) of van Genuchten equations are similar for both the soil types. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
A comprehensive scheme for analysing uniaxial deformation data, taking into account the finite stiffness of the testing machine is presented. Equations relevant to tension and stress relaxation tests carried out under cross head speed control, and to creep testing under constant load, are described. For the first two cases, the implications of not using gauge length extensometry but relying upon cross head displacement for inferring specimen extension, and the role of uncertainty in machine stiffness are also examined. The final section touches upon the extension of the present scheme to account for specimen anelasticity.
Resumo:
Perfect or even mediocre weather predictions over a long period are almost impossible because of the ultimate growth of a small initial error into a significant one. Even though the sensitivity of initial conditions limits the predictability in chaotic systems, an ensemble of prediction from different possible initial conditions and also a prediction algorithm capable of resolving the fine structure of the chaotic attractor can reduce the prediction uncertainty to some extent. All of the traditional chaotic prediction methods in hydrology are based on single optimum initial condition local models which can model the sudden divergence of the trajectories with different local functions. Conceptually, global models are ineffective in modeling the highly unstable structure of the chaotic attractor. This paper focuses on an ensemble prediction approach by reconstructing the phase space using different combinations of chaotic parameters, i.e., embedding dimension and delay time to quantify the uncertainty in initial conditions. The ensemble approach is implemented through a local learning wavelet network model with a global feed-forward neural network structure for the phase space prediction of chaotic streamflow series. Quantification of uncertainties in future predictions are done by creating an ensemble of predictions with wavelet network using a range of plausible embedding dimensions and delay times. The ensemble approach is proved to be 50% more efficient than the single prediction for both local approximation and wavelet network approaches. The wavelet network approach has proved to be 30%-50% more superior to the local approximation approach. Compared to the traditional local approximation approach with single initial condition, the total predictive uncertainty in the streamflow is reduced when modeled with ensemble wavelet networks for different lead times. Localization property of wavelets, utilizing different dilation and translation parameters, helps in capturing most of the statistical properties of the observed data. The need for taking into account all plausible initial conditions and also bringing together the characteristics of both local and global approaches to model the unstable yet ordered chaotic attractor of a hydrologic series is clearly demonstrated.
Resumo:
High sensitivity detection techniques are required for indoor navigation using Global Navigation Satellite System (GNSS) receivers, and typically, a combination of coherent and non- coherent integration is used as the test statistic for detection. The coherent integration exploits the deterministic part of the signal and is limited due to the residual frequency error, navigation data bits and user dynamics, which are not known apriori. So, non- coherent integration, which involves squaring of the coherent integration output, is used to improve the detection sensitivity. Due to this squaring, it is robust against the artifacts introduced due to data bits and/or frequency error. However, it is susceptible to uncertainty in the noise variance, and this can lead to fundamental sensitivity limits in detecting weak signals. In this work, the performance of the conventional non-coherent integration-based GNSS signal detection is studied in the presence of noise uncertainty. It is shown that the performance of the current state of the art GNSS receivers is close to the theoretical SNR limit for reliable detection at moderate levels of noise uncertainty. Alternate robust post-coherent detectors are also analyzed, and are shown to alleviate the noise uncertainty problem. Monte-Carlo simulations are used to confirm the theoretical predictions.
Resumo:
In this paper, we explore a novel idea of using high dynamic range (HDR) technology for uncertainty visualization. We focus on scalar volumetric data sets where every data point is associated with scalar uncertainty. We design a transfer function that maps each data point to a color in HDR space. The luminance component of the color is exploited to capture uncertainty. We modify existing tone mapping techniques and suitably integrate them with volume ray casting to obtain a low dynamic range (LDR) image. The resulting image is displayed on a conventional 8-bits-per-channel display device. The usage of HDR mapping reveals fine details in uncertainty distribution and enables the users to interactively study the data in the context of corresponding uncertainty information. We demonstrate the utility of our method and evaluate the results using data sets from ocean modeling.
Resumo:
In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.
Resumo:
Precise specification of the vertical distribution of cloud optical properties is important to reduce the uncertainty in quantifying the radiative impacts of clouds. The new global observations of vertical profiles of clouds from the CloudSat mission provide opportunities to describe cloud structures and to improve parameterization of clouds in the weather and climate prediction models. In this study, four years (2007-2010) of observations of vertical structure of clouds from the CloudSat cloud profiling radar have been used to document the mean vertical structure of clouds associated with the Indian summer monsoon (ISM) and its intra-seasonal variability. Active and break monsoon spells associated with the intra-seasonal variability of ISM have been identified by an objective criterion. For the present analysis, we considered CloudSat derived column integrated cloud liquid and ice water, and vertically profiles of cloud liquid and ice water content. Over the South Asian monsoon region, deep convective clouds with large vertical extent (up to 14 km) and large values of cloud water and ice content are observed over the north Bay of Bengal. Deep clouds with large ice water content are also observed over north Arabian Sea and adjoining northwest India, along the west coast of India and the south equatorial Indian Ocean. The active monsoon spells are characterized by enhanced deep convection over the Bay of Bengal, west coast of India and northeast Arabian Sea and suppressed convection over the equatorial Indian Ocean. Over the Bay of Bengal, cloud liquid water content and ice water content is enhanced by similar to 90 and similar to 200 % respectively during the active spells. An interesting feature associated with the active spell is the vertical tilting structure of positive CLWC and CIWC anomalies over the Arabian Sea and the Bay of Bengal, which suggests a pre-conditioning process for the northward propagation of the boreal summer intra-seasonal variability. It is also observed that during the break spells, clouds are not completely suppressed over central India. Instead, clouds with smaller vertical extent (3-5 km) are observed due to the presence of a heat low type of circulation. The present results will be useful for validating the vertical structure of clouds in weather and climate prediction models.
Resumo:
This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.
Resumo:
This article describes a new performance-based approach for evaluating the return period of seismic soil liquefaction based on standard penetration test (SPT) and cone penetration test (CPT) data. The conventional liquefaction evaluation methods consider a single acceleration level and magnitude and these approaches fail to take into account the uncertainty in earthquake loading. The seismic hazard analysis based on the probabilistic method clearly shows that a particular acceleration value is being contributed by different magnitudes with varying probability. In the new method presented in this article, the entire range of ground shaking and the entire range of earthquake magnitude are considered and the liquefaction return period is evaluated based on the SPT and CPT data. This article explains the performance-based methodology for the liquefaction analysis – starting from probabilistic seismic hazard analysis (PSHA) for the evaluation of seismic hazard and the performance-based method to evaluate the liquefaction return period. A case study has been done for Bangalore, India, based on SPT data and converted CPT values. The comparison of results obtained from both the methods have been presented. In an area of 220 km2 in Bangalore city, the site class was assessed based on large number of borehole data and 58 Multi-channel analysis of surface wave survey. Using the site class and peak acceleration at rock depth from PSHA, the peak ground acceleration at the ground surface was estimated using probabilistic approach. The liquefaction analysis was done based on 450 borehole data obtained in the study area. The results of CPT match well with the results obtained from similar analysis with SPT data.