969 resultados para Sampling method
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowle ge of the POMDP and allows the experience to be gathered with an arbitrary set of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons.We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to the REINFORCE algorithm showing an order of magnitude reduction in the number of trials required.
Resumo:
One of the key aspects in 3D-image registration is the computation of the joint intensity histogram. We propose a new approach to compute this histogram using uniformly distributed random lines to sample stochastically the overlapping volume between two 3D-images. The intensity values are captured from the lines at evenly spaced positions, taking an initial random offset different for each line. This method provides us with an accurate, robust and fast mutual information-based registration. The interpolation effects are drastically reduced, due to the stochastic nature of the line generation, and the alignment process is also accelerated. The results obtained show a better performance of the introduced method than the classic computation of the joint histogram
Resumo:
The time-of-detection method for aural avian point counts is a new method of estimating abundance, allowing for uncertain probability of detection. The method has been specifically designed to allow for variation in singing rates of birds. It involves dividing the time interval of the point count into several subintervals and recording the detection history of the subintervals when each bird sings. The method can be viewed as generating data equivalent to closed capture–recapture information. The method is different from the distance and multiple-observer methods in that it is not required that all the birds sing during the point count. As this method is new and there is some concern as to how well individual birds can be followed, we carried out a field test of the method using simulated known populations of singing birds, using a laptop computer to send signals to audio stations distributed around a point. The system mimics actual aural avian point counts, but also allows us to know the size and spatial distribution of the populations we are sampling. Fifty 8-min point counts (broken into four 2-min intervals) using eight species of birds were simulated. Singing rate of an individual bird of a species was simulated following a Markovian process (singing bouts followed by periods of silence), which we felt was more realistic than a truly random process. The main emphasis of our paper is to compare results from species singing at (high and low) homogenous rates per interval with those singing at (high and low) heterogeneous rates. Population size was estimated accurately for the species simulated, with a high homogeneous probability of singing. Populations of simulated species with lower but homogeneous singing probabilities were somewhat underestimated. Populations of species simulated with heterogeneous singing probabilities were substantially underestimated. Underestimation was caused by both the very low detection probabilities of all distant individuals and by individuals with low singing rates also having very low detection probabilities.
Resumo:
[1] Cloud cover is conventionally estimated from satellite images as the observed fraction of cloudy pixels. Active instruments such as radar and Lidar observe in narrow transects that sample only a small percentage of the area over which the cloud fraction is estimated. As a consequence, the fraction estimate has an associated sampling uncertainty, which usually remains unspecified. This paper extends a Bayesian method of cloud fraction estimation, which also provides an analytical estimate of the sampling error. This method is applied to test the sensitivity of this error to sampling characteristics, such as the number of observed transects and the variability of the underlying cloud field. The dependence of the uncertainty on these characteristics is investigated using synthetic data simulated to have properties closely resembling observations of the spaceborne Lidar NASA-LITE mission. Results suggest that the variance of the cloud fraction is greatest for medium cloud cover and least when conditions are mostly cloudy or clear. However, there is a bias in the estimation, which is greatest around 25% and 75% cloud cover. The sampling uncertainty is also affected by the mean lengths of clouds and of clear intervals; shorter lengths decrease uncertainty, primarily because there are more cloud observations in a transect of a given length. Uncertainty also falls with increasing number of transects. Therefore a sampling strategy aimed at minimizing the uncertainty in transect derived cloud fraction will have to take into account both the cloud and clear sky length distributions as well as the cloud fraction of the observed field. These conclusions have implications for the design of future satellite missions. This paper describes the first integrated methodology for the analytical assessment of sampling uncertainty in cloud fraction observations from forthcoming spaceborne radar and Lidar missions such as NASA's Calipso and CloudSat.
Resumo:
This paper presents a first attempt to estimate mixing parameters from sea level observations using a particle method based on importance sampling. The method is applied to an ensemble of 128 members of model simulations with a global ocean general circulation model of high complexity. Idealized twin experiments demonstrate that the method is able to accurately reconstruct mixing parameters from an observed mean sea level field when mixing is assumed to be spatially homogeneous. An experiment with inhomogeneous eddy coefficients fails because of the limited ensemble size. This is overcome by the introduction of local weighting, which is able to capture spatial variations in mixing qualitatively. As the sensitivity of sea level for variations in mixing is higher for low values of mixing coefficients, the method works relatively well in regions of low eddy activity.
Resumo:
The jackknife method is often used for variance estimation in sample surveys but has only been developed for a limited class of sampling designs.We propose a jackknife variance estimator which is defined for any without-replacement unequal probability sampling design. We demonstrate design consistency of this estimator for a broad class of point estimators. A Monte Carlo study shows how the proposed estimator may improve on existing estimators.
Resumo:
It is common practice to design a survey with a large number of strata. However, in this case the usual techniques for variance estimation can be inaccurate. This paper proposes a variance estimator for estimators of totals. The method proposed can be implemented with standard statistical packages without any specific programming, as it involves simple techniques of estimation, such as regression fitting.
Resumo:
We show that the Hájek (Ann. Math Statist. (1964) 1491) variance estimator can be used to estimate the variance of the Horvitz–Thompson estimator when the Chao sampling scheme (Chao, Biometrika 69 (1982) 653) is implemented. This estimator is simple and can be implemented with any statistical packages. We consider a numerical and an analytic method to show that this estimator can be used. A series of simulations supports our findings.
Resumo:
Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.
Resumo:
Until recently, there has been little investigation concerning the poor indoor air quality (IAQ) in classrooms. Despite the evidence that the educational building systems in many of the UK institutions have significant defects that may degrade IAQ, systematic assessments of IAQ measurements has been rarely undertaken. When undertaking IAQ measurement, there is a difficult task of representing and characterizing the environment parameters. Although technologies exist to measure these parameters, direct measurements especially in a naturally ventilated spaces are often difficult. This paper presents a methodology for developing a method to characterize indoor environment flow parameters as well as the Carbon Dioxide (CO2) concentrations. Thus, CO2 concentration level can be influenced by the differences in the selection of sampling points and heights. However, because this research focuses on natural ventilation in classrooms, air exchange is provided mainly by air infiltration. It is hoped that the methodology developed and evaluated in this research can effectively simplify the process of estimating the parameters for a systematic assessment of IAQ measurements in a naturally ventilated classrooms.
Resumo:
High spatial resolution vertical profiles of pore-water chemistry have been obtained for a peatland using diffusive equilibrium in thin films (DET) gel probes. Comparison of DET pore-water data with more traditional depth-specific sampling shows good agreement and the DET profiling method is less invasive and less likely to induce mixing of pore-waters. Chloride mass balances as water tables fell in the early summer indicate that evaporative concentration dominates and there is negligible lateral flow in the peat. Lack of lateral flow allows element budgets for the same site at different times to be compared. The high spatial resolution of sampling also enables gradients to be observed that permit calculations of vertical fluxes. Sulfate concentrations fall at two sites with net rates of 1.5 and 5.0nmol cm− 3 day− 1, likely due to a dominance of bacterial sulfate reduction, while a third site showed a net gain in sulfate due to oxidation of sulfur over the study period at an average rate of 3.4nmol cm− 3 day− 1. Behaviour of iron is closely coupled to that of sulfur; there is net removal of iron at the two sites where sulfate reduction dominates and addition of iron where oxidation dominates. The profiles demonstrate that, in addition to strong vertical redox related chemical changes, there is significant spatial heterogeneity. Whilst overall there is evidence for net reduction of sulfate within the peatland pore-waters, this can be reversed, at least temporarily, during periods of drought when sulfide oxidation with resulting acid production predominates.
Resumo:
A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and nonconvective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud-resolving model simulations, and from the Bayesian formulation itself. Synthetic rain-rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in TMI instantaneous rain-rate estimates at 0.5°-resolution range from approximately 50% at 1 mm h−1 to 20% at 14 mm h−1. Errors in collocated spaceborne radar rain-rate estimates are roughly 50%–80% of the TMI errors at this resolution. The estimated algorithm random error in TMI rain rates at monthly, 2.5° resolution is relatively small (less than 6% at 5 mm day−1) in comparison with the random error resulting from infrequent satellite temporal sampling (8%–35% at the same rain rate). Percentage errors resulting from sampling decrease with increasing rain rate, and sampling errors in latent heating rates follow the same trend. Averaging over 3 months reduces sampling errors in rain rates to 6%–15% at 5 mm day−1, with proportionate reductions in latent heating sampling errors.
Resumo:
A sampling oscilloscope is one of the main units in automatic pulse measurement system (APMS). The time jitter in waveform samplers is an important error source that affect the precision of data acquisition. In this paper, this kind of error is greatly reduced by using the deconvolution method. First, the probability density function (PDF) of time jitter distribution is determined by the statistical approach, then, this PDF is used as convolution kern to deconvolve with the acquired waveform data with additional averaging, and the result is the waveform data in which the effect of time jitter has been removed, and the measurement precision of APMS is greatly improved. In addition, some computer simulations are given which prove the success of the method given in this paper.
Resumo:
Following a malicious or accidental atmospheric release in an outdoor environment it is essential for first responders to ensure safety by identifying areas where human life may be in danger. For this to happen quickly, reliable information is needed on the source strength and location, and the type of chemical agent released. We present here an inverse modelling technique that estimates the source strength and location of such a release, together with the uncertainty in those estimates, using a limited number of measurements of concentration from a network of chemical sensors considering a single, steady, ground-level source. The technique is evaluated using data from a set of dispersion experiments conducted in a meteorological wind tunnel, where simultaneous measurements of concentration time series were obtained in the plume from a ground-level point-source emission of a passive tracer. In particular, we analyze the response to the number of sensors deployed and their arrangement, and to sampling and model errors. We find that the inverse algorithm can generate acceptable estimates of the source characteristics with as few as four sensors, providing these are well-placed and that the sampling error is controlled. Configurations with at least three sensors in a profile across the plume were found to be superior to other arrangements examined. Analysis of the influence of sampling error due to the use of short averaging times showed that the uncertainty in the source estimates grew as the sampling time decreased. This demonstrated that averaging times greater than about 5min (full scale time) lead to acceptable accuracy.