954 resultados para Noisy data


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider a Cauchy problem for the heat equation, where the temperature field is to be reconstructed from the temperature and heat flux given on a part of the boundary of the solution domain. We employ a Landweber type method proposed in [2], where a sequence of mixed well-posed problems are solved at each iteration step to obtain a stable approximation to the original Cauchy problem. We develop an efficient boundary integral equation method for the numerical solution of these mixed problems, based on the method of Rothe. Numerical examples are presented both with exact and noisy data, showing the efficiency and stability of the proposed procedure and approximations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An iterative procedure for determining temperature fields from Cauchy data given on a part of the boundary is presented. At each iteration step, a series of mixed well-posed boundary value problems are solved for the heat operator and its adjoint. A convergence proof of this method in a weighted L2-space is included, as well as a stopping criteria for the case of noisy data. Moreover, a solvability result in a weighted Sobolev space for a parabolic initial boundary value problem of second order with mixed boundary conditions is presented. Regularity of the solution is proved. (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-08

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inverse problems based on using experimental data to estimate unknown parameters of a system often arise in biological and chaotic systems. In this paper, we consider parameter estimation in systems biology involving linear and non-linear complex dynamical models, including the Michaelis–Menten enzyme kinetic system, a dynamical model of competence induction in Bacillus subtilis bacteria and a model of feedback bypass in B. subtilis bacteria. We propose some novel techniques for inverse problems. Firstly, we establish an approximation of a non-linear differential algebraic equation that corresponds to the given biological systems. Secondly, we use the Picard contraction mapping, collage methods and numerical integration techniques to convert the parameter estimation into a minimization problem of the parameters. We propose two optimization techniques: a grid approximation method and a modified hybrid Nelder–Mead simplex search and particle swarm optimization (MH-NMSS-PSO) for non-linear parameter estimation. The two techniques are used for parameter estimation in a model of competence induction in B. subtilis bacteria with noisy data. The MH-NMSS-PSO scheme is applied to a dynamical model of competence induction in B. subtilis bacteria based on experimental data and the model for feedback bypass. Numerical results demonstrate the effectiveness of our approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A big challenge for classification on text is the noisy of text data. It makes classification quality low. Many classification process can be divided into two sequential steps scoring and threshold setting (thresholding). Therefore to deal with noisy data problem, it is important to describe positive feature effectively scoring and to set a suitable threshold. Most existing text classifiers do not concentrate on these two jobs. In this paper, we propose a novel text classifier with pattern-based scoring that describe positive feature effectively, followed by threshold setting. The thresholding is based on score of training set, make it is simple to implement in other scoring methods. Experiment shows that our pattern-based classifier is promising.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A health-monitoring and life-estimation strategy for composite rotor blades is developed in this work. The cross-sectional stiffness reduction obtained by physics-based models is expressed as a function of the life of the structure using a recent phenomenological damage model. This stiffness reduction is further used to study the behavior of measurable system parameters such as blade deflections, loads, and strains of a composite rotor blade in static analysis and forward flight. The simulated measurements are obtained using an aeroelastic analysis of the composite rotor blade based on the finite element in space and time with physics-based damage modes that are then linked to the life consumption of the blade. The model-based measurements are contaminated with noise to simulate real data. Genetic fuzzy systems are developed for global online prediction of physical damage and life consumption using displacement- and force-based measurement deviations between damaged and undamaged conditions. Furthermore, local online prediction of physical damage and life consumption is done using strains measured along the blade length. It is observed that the life consumption in the matrix-cracking zone is about 12-15% and life consumption in debonding/delamination zone is about 45-55% of the total life of the blade. It is also observed that the success rate of the genetic fuzzy systems depends upon the number of measurements, type of measurements and training, and the testing noise level. The genetic fuzzy systems work quite well with noisy data and are recommended for online structural health monitoring of composite helicopter rotor blades.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Increased emphasis on rotorcraft performance and perational capabilities has resulted in accurate computation of aerodynamic stability and control parameters. System identification is one such tool in which the model structure and parameters such as aerodynamic stability and control derivatives are derived. In the present work, the rotorcraft aerodynamic parameters are computed using radial basis function neural networks (RBFN) in the presence of both state and measurement noise. The effect of presence of outliers in the data is also considered. RBFN is found to give superior results compared to finite difference derivatives for noisy data. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Deterministic models have been widely used to predict water quality in distribution systems, but their calibration requires extensive and accurate data sets for numerous parameters. In this study, alternative data-driven modeling approaches based on artificial neural networks (ANNs) were used to predict temporal variations of two important characteristics of water quality chlorine residual and biomass concentrations. The authors considered three types of ANN algorithms. Of these, the Levenberg-Marquardt algorithm provided the best results in predicting residual chlorine and biomass with error-free and ``noisy'' data. The ANN models developed here can generate water quality scenarios of piped systems in real time to help utilities determine weak points of low chlorine residual and high biomass concentration and select optimum remedial strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The weighted-least-squares method based on the Gauss-Newton minimization technique is used for parameter estimation in water distribution networks. The parameters considered are: element resistances (single and/or group resistances, Hazen-Williams coefficients, pump specifications) and consumptions (for single or multiple loading conditions). The measurements considered are: nodal pressure heads, pipe flows, head loss in pipes, and consumptions/inflows. An important feature of the study is a detailed consideration of the influence of different choice of weights on parameter estimation, for error-free data, noisy data, and noisy data which include bad data. The method is applied to three different networks including a real-life problem.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We explore a pseudodynamic form of the quadratic parameter update equation for diffuse optical tomographic reconstruction from noisy data. A few explicit and implicit strategies for obtaining the parameter updates via a semianalytical integration of the pseudodynamic equations are proposed. Despite the ill-posedness of the inverse problem associated with diffuse optical tomography, adoption of the quadratic update scheme combined with the pseudotime integration appears not only to yield higher convergence, but also a muted sensitivity to the regularization parameters, which include the pseudotime step size for integration. These observations are validated through reconstructions with both numerically generated and experimentally acquired data. (C) 2011 Optical Society of America

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we explore noise-tolerant learning of classifiers. We formulate the problem as follows. We assume that there is an unobservable training set that is noise free. The actual training set given to the learning algorithm is obtained from this ideal data set by corrupting the class label of each example. The probability that the class label of an example is corrupted is a function of the feature vector of the example. This would account for most kinds of noisy data one encounters in practice. We say that a learning method is noise tolerant if the classifiers learnt with noise-free data and with noisy data, both have the same classification accuracy on the noise-free data. In this paper, we analyze the noise-tolerance properties of risk minimization (under different loss functions). We show that risk minimization under 0-1 loss function has impressive noise-tolerance properties and that under squared error loss is tolerant only to uniform noise; risk minimization under other loss functions is not noise tolerant. We conclude this paper with some discussion on the implications of these theoretical results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Typical image-guided diffuse optical tomographic image reconstruction procedures involve reduction of the number of optical parameters to be reconstructed equal to the number of distinct regions identified in the structural information provided by the traditional imaging modality. This makes the image reconstruction problem less ill-posed compared to traditional underdetermined cases. Still, the methods that are deployed in this case are same as those used for traditional diffuse optical image reconstruction, which involves a regularization term as well as computation of the Jacobian. A gradient-free Nelder-Mead simplex method is proposed here to perform the image reconstruction procedure and is shown to provide solutions that closely match ones obtained using established methods, even in highly noisy data. The proposed method also has the distinct advantage of being more efficient owing to being regularization free, involving only repeated forward calculations. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)