991 resultados para Noise models
Resumo:
Optimal design for parameter estimation in Gaussian process regression models with input-dependent noise is examined. The motivation stems from the area of computer experiments, where computationally demanding simulators are approximated using Gaussian process emulators to act as statistical surrogates. In the case of stochastic simulators, which produce a random output for a given set of model inputs, repeated evaluations are useful, supporting the use of replicate observations in the experimental design. The findings are also applicable to the wider context of experimental design for Gaussian process regression and kriging. Designs are proposed with the aim of minimising the variance of the Gaussian process parameter estimates. A heteroscedastic Gaussian process model is presented which allows for an experimental design technique based on an extension of Fisher information to heteroscedastic models. It is empirically shown that the error of the approximation of the parameter variance by the inverse of the Fisher information is reduced as the number of replicated points is increased. Through a series of simulation experiments on both synthetic data and a systems biology stochastic simulator, optimal designs with replicate observations are shown to outperform space-filling designs both with and without replicate observations. Guidance is provided on best practice for optimal experimental design for stochastic response models. © 2013 Elsevier Inc. All rights reserved.
Resumo:
Robust controllers for nonlinear stochastic systems with functional uncertainties can be consistently designed using probabilistic control methods. In this paper a generalised probabilistic controller design for the minimisation of the Kullback-Leibler divergence between the actual joint probability density function (pdf) of the closed loop control system, and an ideal joint pdf is presented emphasising how the uncertainty can be systematically incorporated in the absence of reliable systems models. To achieve this objective all probabilistic models of the system are estimated from process data using mixture density networks (MDNs) where all the parameters of the estimated pdfs are taken to be state and control input dependent. Based on this dependency of the density parameters on the input values, explicit formulations to the construction of optimal generalised probabilistic controllers are obtained through the techniques of dynamic programming and adaptive critic methods. Using the proposed generalised probabilistic controller, the conditional joint pdfs can be made to follow the ideal ones. A simulation example is used to demonstrate the implementation of the algorithm and encouraging results are obtained.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
Recent discussion regarding whether the noise that limits 2AFC discrimination performance is fixed or variable has focused either on describing experimental methods that presumably dissociate the effects of response mean and variance or on reanalyzing a published data set with the aim of determining how to solve the question through goodness-of-fit statistics. This paper illustrates that the question cannot be solved by fitting models to data and assessing goodness-of-fit because data on detection and discrimination performance can be indistinguishably fitted by models that assume either type of noise when each is coupled with a convenient form for the transducer function. Thus, success or failure at fitting a transducer model merely illustrates the capability (or lack thereof) of some particular combination of transducer function and variance function to account for the data, but it cannot disclose the nature of the noise. We also comment on some of the issues that have been raised in recent exchange on the topic, namely, the existence of additional constraints for the models, the presence of asymmetric asymptotes, the likelihood of history-dependent noise, and the potential of certain experimental methods to dissociate the effects of response mean and variance.
Resumo:
We propose a mathematically well-founded approach for locating the source (initial state) of density functions evolved within a nonlinear reaction-diffusion model. The reconstruction of the initial source is an ill-posed inverse problem since the solution is highly unstable with respect to measurement noise. To address this instability problem, we introduce a regularization procedure based on the nonlinear Landweber method for the stable determination of the source location. This amounts to solving a sequence of well-posed forward reaction-diffusion problems. The developed framework is general, and as a special instance we consider the problem of source localization of brain tumors. We show numerically that the source of the initial densities of tumor cells are reconstructed well on both imaging data consisting of simple and complex geometric structures.
Resumo:
Human use of the oceans is increasingly in conflict with conservation of endangered species. Methods for managing the spatial and temporal placement of industries such as military, fishing, transportation and offshore energy, have historically been post hoc; i.e. the time and place of human activity is often already determined before assessment of environmental impacts. In this dissertation, I build robust species distribution models in two case study areas, US Atlantic (Best et al. 2012) and British Columbia (Best et al. 2015), predicting presence and abundance respectively, from scientific surveys. These models are then applied to novel decision frameworks for preemptively suggesting optimal placement of human activities in space and time to minimize ecological impacts: siting for offshore wind energy development, and routing ships to minimize risk of striking whales. Both decision frameworks relate the tradeoff between conservation risk and industry profit with synchronized variable and map views as online spatial decision support systems.
For siting offshore wind energy development (OWED) in the U.S. Atlantic (chapter 4), bird density maps are combined across species with weights of OWED sensitivity to collision and displacement and 10 km2 sites are compared against OWED profitability based on average annual wind speed at 90m hub heights and distance to transmission grid. A spatial decision support system enables toggling between the map and tradeoff plot views by site. A selected site can be inspected for sensitivity to a cetaceans throughout the year, so as to capture months of the year which minimize episodic impacts of pre-operational activities such as seismic airgun surveying and pile driving.
Routing ships to avoid whale strikes (chapter 5) can be similarly viewed as a tradeoff, but is a different problem spatially. A cumulative cost surface is generated from density surface maps and conservation status of cetaceans, before applying as a resistance surface to calculate least-cost routes between start and end locations, i.e. ports and entrance locations to study areas. Varying a multiplier to the cost surface enables calculation of multiple routes with different costs to conservation of cetaceans versus cost to transportation industry, measured as distance. Similar to the siting chapter, a spatial decisions support system enables toggling between the map and tradeoff plot view of proposed routes. The user can also input arbitrary start and end locations to calculate the tradeoff on the fly.
Essential to the input of these decision frameworks are distributions of the species. The two preceding chapters comprise species distribution models from two case study areas, U.S. Atlantic (chapter 2) and British Columbia (chapter 3), predicting presence and density, respectively. Although density is preferred to estimate potential biological removal, per Marine Mammal Protection Act requirements in the U.S., all the necessary parameters, especially distance and angle of observation, are less readily available across publicly mined datasets.
In the case of predicting cetacean presence in the U.S. Atlantic (chapter 2), I extracted datasets from the online OBIS-SEAMAP geo-database, and integrated scientific surveys conducted by ship (n=36) and aircraft (n=16), weighting a Generalized Additive Model by minutes surveyed within space-time grid cells to harmonize effort between the two survey platforms. For each of 16 cetacean species guilds, I predicted the probability of occurrence from static environmental variables (water depth, distance to shore, distance to continental shelf break) and time-varying conditions (monthly sea-surface temperature). To generate maps of presence vs. absence, Receiver Operator Characteristic (ROC) curves were used to define the optimal threshold that minimizes false positive and false negative error rates. I integrated model outputs, including tables (species in guilds, input surveys) and plots (fit of environmental variables, ROC curve), into an online spatial decision support system, allowing for easy navigation of models by taxon, region, season, and data provider.
For predicting cetacean density within the inner waters of British Columbia (chapter 3), I calculated density from systematic, line-transect marine mammal surveys over multiple years and seasons (summer 2004, 2005, 2008, and spring/autumn 2007) conducted by Raincoast Conservation Foundation. Abundance estimates were calculated using two different methods: Conventional Distance Sampling (CDS) and Density Surface Modelling (DSM). CDS generates a single density estimate for each stratum, whereas DSM explicitly models spatial variation and offers potential for greater precision by incorporating environmental predictors. Although DSM yields a more relevant product for the purposes of marine spatial planning, CDS has proven to be useful in cases where there are fewer observations available for seasonal and inter-annual comparison, particularly for the scarcely observed elephant seal. Abundance estimates are provided on a stratum-specific basis. Steller sea lions and harbour seals are further differentiated by ‘hauled out’ and ‘in water’. This analysis updates previous estimates (Williams & Thomas 2007) by including additional years of effort, providing greater spatial precision with the DSM method over CDS, novel reporting for spring and autumn seasons (rather than summer alone), and providing new abundance estimates for Steller sea lion and northern elephant seal. In addition to providing a baseline of marine mammal abundance and distribution, against which future changes can be compared, this information offers the opportunity to assess the risks posed to marine mammals by existing and emerging threats, such as fisheries bycatch, ship strikes, and increased oil spill and ocean noise issues associated with increases of container ship and oil tanker traffic in British Columbia’s continental shelf waters.
Starting with marine animal observations at specific coordinates and times, I combine these data with environmental data, often satellite derived, to produce seascape predictions generalizable in space and time. These habitat-based models enable prediction of encounter rates and, in the case of density surface models, abundance that can then be applied to management scenarios. Specific human activities, OWED and shipping, are then compared within a tradeoff decision support framework, enabling interchangeable map and tradeoff plot views. These products make complex processes transparent for gaming conservation, industry and stakeholders towards optimal marine spatial management, fundamental to the tenets of marine spatial planning, ecosystem-based management and dynamic ocean management.
Resumo:
Shipping noise is a threat to marine wildlife. Grey seals are benthic foragers, and thus experience acoustic noise throughout the water column, which makes them a good model species for a case study of the potential impacts of shipping noise. We used ship track data from the Celtic Sea, seal track data and a coupled ocean-acoustic modelling system to assess the noise exposure of grey seals along their tracks. It was found that the animals experience step changes in sound levels up to ~20dB at a frequency of 125Hz, and ~10dB on average over 10-1000Hz when they dive through the thermocline, particularly during summer. Our results showed large seasonal differences in the noise level experienced by the seals. These results reveal the actual noise exposure by the animals and could help in marine spatial planning.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
Current practice for analysing functional neuroimaging data is to average the brain signals recorded at multiple sensors or channels on the scalp over time across hundreds of trials or replicates to eliminate noise and enhance the underlying signal of interest. These studies recording brain signals non-invasively using functional neuroimaging techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) generate complex, high dimensional and noisy data for many subjects at a number of replicates. Single replicate (or single trial) analysis of neuroimaging data have gained focus as they are advantageous to study the features of the signals at each replicate without averaging out important features in the data that the current methods employ. The research here is conducted to systematically develop flexible regression mixed models for single trial analysis of specific brain activities using examples from EEG and MEG to illustrate the models. This thesis follows three specific themes: i) artefact correction to estimate the `brain' signal which is of interest, ii) characterisation of the signals to reduce their dimensions, and iii) model fitting for single trials after accounting for variations between subjects and within subjects (between replicates). The models are developed to establish evidence of two specific neurological phenomena - entrainment of brain signals to an $\alpha$ band of frequencies (8-12Hz) and dipolar brain activation in the same $\alpha$ frequency band in an EEG experiment and a MEG study, respectively.
Resumo:
In this contribution, a system identification procedure of a two-input Wiener model suitable for the analysis of the disturbance behavior of integrated nonlinear circuits is presented. The identified block model is comprised of two linear dynamic and one static nonlinear block, which are determined using an parameterized approach. In order to characterize the linear blocks, an correlation analysis using a white noise input in combination with a model reduction scheme is adopted. After having characterized the linear blocks, from the output spectrum under single tone excitation at each input a linear set of equations will be set up, whose solution gives the coefficients of the nonlinear block. By this data based black box approach, the distortion behavior of a nonlinear circuit under the influence of an interfering signal at an arbitrary input port can be determined. Such an interfering signal can be, for example, an electromagnetic interference signal which conductively couples into the port of consideration. © 2011 Author(s).
Resumo:
Water regimes in the Brazilian Cerrados are sensitive to climatological disturbances and human intervention. The risk that critical water-table levels are exceeded over long periods of time can be estimated by applying stochastic methods in modeling the dynamic relationship between water levels and driving forces such as precipitation and evapotranspiration. In this study, a transfer function-noise model, the so called PIRFICT-model, is applied to estimate the dynamic relationship between water-table depth and precipitation surplus/deficit in a watershed with a groundwater monitoring scheme in the Brazilian Cerrados. Critical limits were defined for a period in the Cerrados agricultural calendar, the end of the rainy season, when extremely shallow levels (< 0.5-m depth) can pose a risk to plant health and machinery before harvesting. By simulating time-series models, the risk of exceeding critical thresholds during a continuous period of time (e.g. 10 days) is described by probability levels. These simulated probabilities were interpolated spatially using universal kriging, incorporating information related to the drainage basin from a digital elevation model. The resulting map reduced model uncertainty. Three areas were defined as presenting potential risk at the end of the rainy season. These areas deserve attention with respect to water-management and land-use planning.
Resumo:
Doutoramento em Gestão
Resumo:
Doctor of Philosophy in Mathematics
Calculation of mutual information for nonlinear communication channel at large signal-to-noise ratio
Resumo:
Using the path-integral technique we examine the mutual information for the communication channel modeled by the nonlinear Schrödinger equation with additive Gaussian noise. The nonlinear Schrödinger equation is one of the fundamental models in nonlinear physics, and it has a broad range of applications, including fiber optical communications - the backbone of the internet. At large signal-to-noise ratio we present the mutual information through the path-integral, which is convenient for the perturbative expansion in nonlinearity. In the limit of small noise and small nonlinearity we derive analytically the first nonzero nonlinear correction to the mutual information for the channel.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.