954 resultados para Noisy data
Resumo:
In this study, we investigate the problem of reconstruction of a stationary temperature field from given temperature and heat flux on a part of the boundary of a semi-infinite region containing an inclusion. This situation can be modelled as a Cauchy problem for the Laplace operator and it is an ill-posed problem in the sense of Hadamard. We propose and investigate a Landweber-Fridman type iterative method, which preserve the (stationary) heat operator, for the stable reconstruction of the temperature field on the boundary of the inclusion. In each iteration step, mixed boundary value problems for the Laplace operator are solved in the semi-infinite region. Well-posedness of these problems is investigated and convergence of the procedures is discussed. For the numerical implementation of these mixed problems an efficient boundary integral method is proposed which is based on the indirect variant of the boundary integral approach. Using this approach the mixed problems are reduced to integral equations over the (bounded) boundary of the inclusion. Numerical examples are included showing that stable and accurate reconstructions of the temperature field on the boundary of the inclusion can be obtained also in the case of noisy data. These results are compared with those obtained with the alternating iterative method.
Resumo:
The determination of the displacement and the space-dependent force acting on a vibrating structure from measured final or time-average displacement observation is thoroughly investigated. Several aspects related to the existence and uniqueness of a solution of the linear but ill-posed inverse force problems are highlighted. After that, in order to capture the solution a variational formulation is proposed and the gradient of the least-squares functional that is minimized is rigorously and explicitly derived. Numerical results obtained using the Landweber method and the conjugate gradient method are presented and discussed illustrating the convergence of the iterative procedures for exact input data. Furthermore, for noisy data the semi-convergence phenomenon appears, as expected, and stability is restored by stopping the iterations according to the discrepancy principle criterion once the residual becomes close to the amount of noise. The present investigation will be significant to researchers concerned with wave propagation and control of vibrating structures.
Resumo:
In the oil prospection research seismic data are usually irregular and sparsely sampled along the spatial coordinates due to obstacles in placement of geophones. Fourier methods provide a way to make the regularization of seismic data which are efficient if the input data is sampled on a regular grid. However, when these methods are applied to a set of irregularly sampled data, the orthogonality among the Fourier components is broken and the energy of a Fourier component may "leak" to other components, a phenomenon called "spectral leakage". The objective of this research is to study the spectral representation of irregularly sampled data method. In particular, it will be presented the basic structure of representation of the NDFT (nonuniform discrete Fourier transform), study their properties and demonstrate its potential in the processing of the seismic signal. In this way we study the FFT (fast Fourier transform) and the NFFT (nonuniform fast Fourier transform) which rapidly calculate the DFT (discrete Fourier transform) and NDFT. We compare the recovery of the signal using the FFT, DFT and NFFT. We approach the interpolation of seismic trace using the ALFT (antileakage Fourier transform) to overcome the problem of spectral leakage caused by uneven sampling. Applications to synthetic and real data showed that ALFT method works well on complex geology seismic data and suffers little with irregular spatial sampling of the data and edge effects, in addition it is robust and stable with noisy data. However, it is not as efficient as the FFT and its reconstruction is not as good in the case of irregular filling with large holes in the acquisition.
Resumo:
In the oil prospection research seismic data are usually irregular and sparsely sampled along the spatial coordinates due to obstacles in placement of geophones. Fourier methods provide a way to make the regularization of seismic data which are efficient if the input data is sampled on a regular grid. However, when these methods are applied to a set of irregularly sampled data, the orthogonality among the Fourier components is broken and the energy of a Fourier component may "leak" to other components, a phenomenon called "spectral leakage". The objective of this research is to study the spectral representation of irregularly sampled data method. In particular, it will be presented the basic structure of representation of the NDFT (nonuniform discrete Fourier transform), study their properties and demonstrate its potential in the processing of the seismic signal. In this way we study the FFT (fast Fourier transform) and the NFFT (nonuniform fast Fourier transform) which rapidly calculate the DFT (discrete Fourier transform) and NDFT. We compare the recovery of the signal using the FFT, DFT and NFFT. We approach the interpolation of seismic trace using the ALFT (antileakage Fourier transform) to overcome the problem of spectral leakage caused by uneven sampling. Applications to synthetic and real data showed that ALFT method works well on complex geology seismic data and suffers little with irregular spatial sampling of the data and edge effects, in addition it is robust and stable with noisy data. However, it is not as efficient as the FFT and its reconstruction is not as good in the case of irregular filling with large holes in the acquisition.
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.
Resumo:
Highlights of Data Expedition: • Students explored daily observations of local climate data spanning the past 35 years. • Topological Data Analysis, or TDA for short, provides cutting-edge tools for studying the geometry of data in arbitrarily high dimensions. • Using TDA tools, students discovered intrinsic dynamical features of the data and learned how to quantify periodic phenomenon in a time-series. • Since nature invariably produces noisy data which rarely has exact periodicity, students also considered the theoretical basis of almost-periodicity and even invented and tested new mathematical definitions of almost-periodic functions. Summary The dataset we used for this data expedition comes from the Global Historical Climatology Network. “GHCN (Global Historical Climatology Network)-Daily is an integrated database of daily climate summaries from land surface stations across the globe.” Source: https://www.ncdc.noaa.gov/oa/climate/ghcn-daily/ We focused on the daily maximum and minimum temperatures from January 1, 1980 to April 1, 2015 collected from RDU International Airport. Through a guided series of exercises designed to be performed in Matlab, students explore these time-series, initially by direct visualization and basic statistical techniques. Then students are guided through a special sliding-window construction which transforms a time-series into a high-dimensional geometric curve. These high-dimensional curves can be visualized by projecting down to lower dimensions as in the figure below (Figure 1), however, our focus here was to use persistent homology to directly study the high-dimensional embedding. The shape of these curves has meaningful information but how one describes the “shape” of data depends on which scale the data is being considered. However, choosing the appropriate scale is rarely an obvious choice. Persistent homology overcomes this obstacle by allowing us to quantitatively study geometric features of the data across multiple-scales. Through this data expedition, students are introduced to numerically computing persistent homology using the rips collapse algorithm and interpreting the results. In the specific context of sliding-window constructions, 1-dimensional persistent homology can reveal the nature of periodic structure in the original data. I created a special technique to study how these high-dimensional sliding-window curves form loops in order to quantify the periodicity. Students are guided through this construction and learn how to visualize and interpret this information. Climate data is extremely complex (as anyone who has suffered from a bad weather prediction can attest) and numerous variables play a role in determining our daily weather and temperatures. This complexity coupled with imperfections of measuring devices results in very noisy data. This causes the annual seasonal periodicity to be far from exact. To this end, I have students explore existing theoretical notions of almost-periodicity and test it on the data. They find that some existing definitions are also inadequate in this context. Hence I challenged them to invent new mathematics by proposing and testing their own definition. These students rose to the challenge and suggested a number of creative definitions. While autocorrelation and spectral methods based on Fourier analysis are often used to explore periodicity, the construction here provides an alternative paradigm to quantify periodic structure in almost-periodic signals using tools from topological data analysis.
Resumo:
Current practice for analysing functional neuroimaging data is to average the brain signals recorded at multiple sensors or channels on the scalp over time across hundreds of trials or replicates to eliminate noise and enhance the underlying signal of interest. These studies recording brain signals non-invasively using functional neuroimaging techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) generate complex, high dimensional and noisy data for many subjects at a number of replicates. Single replicate (or single trial) analysis of neuroimaging data have gained focus as they are advantageous to study the features of the signals at each replicate without averaging out important features in the data that the current methods employ. The research here is conducted to systematically develop flexible regression mixed models for single trial analysis of specific brain activities using examples from EEG and MEG to illustrate the models. This thesis follows three specific themes: i) artefact correction to estimate the `brain' signal which is of interest, ii) characterisation of the signals to reduce their dimensions, and iii) model fitting for single trials after accounting for variations between subjects and within subjects (between replicates). The models are developed to establish evidence of two specific neurological phenomena - entrainment of brain signals to an $\alpha$ band of frequencies (8-12Hz) and dipolar brain activation in the same $\alpha$ frequency band in an EEG experiment and a MEG study, respectively.
Resumo:
The Web 2.0 has resulted in a shift as to how users consume and interact with the information, and has introduced a wide range of new textual genres, such as reviews or microblogs, through which users communicate, exchange, and share opinions. The exploitation of all this user-generated content is of great value both for users and companies, in order to assist them in their decision-making processes. Given this context, the analysis and development of automatic methods that can help manage online information in a quicker manner are needed. Therefore, this article proposes and evaluates a novel concept-level approach for ultra-concise opinion abstractive summarization. Our approach is characterized by the integration of syntactic sentence simplification, sentence regeneration and internal concept representation into the summarization process, thus being able to generate abstractive summaries, which is one the most challenging issues for this task. In order to be able to analyze different settings for our approach, the use of the sentence regeneration module was made optional, leading to two different versions of the system (one with sentence regeneration and one without). For testing them, a corpus of 400 English texts, gathered from reviews and tweets belonging to two different domains, was used. Although both versions were shown to be reliable methods for generating this type of summaries, the results obtained indicate that the version without sentence regeneration yielded to better results, improving the results of a number of state-of-the-art systems by 9%, whereas the version with sentence regeneration proved to be more robust to noisy data.
Resumo:
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.
Resumo:
Traffic state estimation in an urban road network remains a challenge for traffic models and the question of how such a network performs remains a difficult one to answer for traffic operators. Lack of detailed traffic information has long restricted research in this area. The introduction of Bluetooth into the automotive world presented an alternative that has now developed to a stage where large-scale test-beds are becoming available, for traffic monitoring and model validation purposes. But how much confidence should we have in such data? This paper aims to give an overview of the usage of Bluetooth, primarily for the city-scale management of urban transport networks, and to encourage researchers and practitioners to take a more cautious look at what is currently understood as a mature technology for monitoring travellers in urban environments. We argue that the full value of this technology is yet to be realised, for the analytical accuracies peculiar to the data have still to be adequately resolved.
Resumo:
Most of the manual labor needed to create the geometric building information model (BIM) of an existing facility is spent converting raw point cloud data (PCD) to a BIM description. Automating this process would drastically reduce the modeling cost. Surface extraction from PCD is a fundamental step in this process. Compact modeling of redundant points in PCD as a set of planes leads to smaller file size and fast interactive visualization on cheap hardware. Traditional approaches for smooth surface reconstruction do not explicitly model the sparse scene structure or significantly exploit the redundancy. This paper proposes a method based on sparsity-inducing optimization to address the planar surface extraction problem. Through sparse optimization, points in PCD are segmented according to their embedded linear subspaces. Within each segmented part, plane models can be estimated. Experimental results on a typical noisy PCD demonstrate the effectiveness of the algorithm.
Resumo:
This paper presents an approach for detecting local damage in large scale frame structures by utilizing regularization methods for ill-posed problems. A direct relationship between the change in stiffness caused by local damage and the measured modal data for the damaged structure is developed, based on the perturbation method for structural dynamic systems. Thus, the measured incomplete modal data can be directly adopted in damage identification without requiring model reduction techniques, and common regularization methods could be effectively employed to solve the developed equations. Damage indicators are appropriately chosen to reflect both the location and severity of local damage in individual components of frame structures such as in brace members and at beam-column joints. The Truncated Singular Value Decomposition solution incorporating the Generalized Cross Validation method is introduced to evaluate the damage indicators for the cases when realistic errors exist in modal data measurements. Results for a 16-story building model structure show that structural damage can be correctly identified at detailed level using only limited information on the measured noisy modal data for the damaged structure.
Resumo:
A method for measuring the phase of oscillations from noisy time series is proposed. To obtain the phase, the signal is filtered in such a way that the filter output has minimal relative variation in the amplitude over all filters with complex-valued impulse response. The argument of the filter output yields the phase. Implementation of the algorithm and interpretation of the result are discussed. We argue that the phase obtained by the proposed method has a low susceptibility to measurement noise and a low rate of artificial phase slips. The method is applied for the detection and classification of mode locking in vortex flow meters. A measure for the strength of mode locking is proposed.