886 resultados para Discrete Gaussian Sampling
Resumo:
Cette thèse présente des méthodes de traitement de données de comptage en particulier et des données discrètes en général. Il s'inscrit dans le cadre d'un projet stratégique du CRNSG, nommé CC-Bio, dont l'objectif est d'évaluer l'impact des changements climatiques sur la répartition des espèces animales et végétales. Après une brève introduction aux notions de biogéographie et aux modèles linéaires mixtes généralisés aux chapitres 1 et 2 respectivement, ma thèse s'articulera autour de trois idées majeures. Premièrement, nous introduisons au chapitre 3 une nouvelle forme de distribution dont les composantes ont pour distributions marginales des lois de Poisson ou des lois de Skellam. Cette nouvelle spécification permet d'incorporer de l'information pertinente sur la nature des corrélations entre toutes les composantes. De plus, nous présentons certaines propriétés de ladite distribution. Contrairement à la distribution multidimensionnelle de Poisson qu'elle généralise, celle-ci permet de traiter les variables avec des corrélations positives et/ou négatives. Une simulation permet d'illustrer les méthodes d'estimation dans le cas bidimensionnel. Les résultats obtenus par les méthodes bayésiennes par les chaînes de Markov par Monte Carlo (CMMC) indiquent un biais relatif assez faible de moins de 5% pour les coefficients de régression des moyennes contrairement à ceux du terme de covariance qui semblent un peu plus volatils. Deuxièmement, le chapitre 4 présente une extension de la régression multidimensionnelle de Poisson avec des effets aléatoires ayant une densité gamma. En effet, conscients du fait que les données d'abondance des espèces présentent une forte dispersion, ce qui rendrait fallacieux les estimateurs et écarts types obtenus, nous privilégions une approche basée sur l'intégration par Monte Carlo grâce à l'échantillonnage préférentiel. L'approche demeure la même qu'au chapitre précédent, c'est-à-dire que l'idée est de simuler des variables latentes indépendantes et de se retrouver dans le cadre d'un modèle linéaire mixte généralisé (GLMM) conventionnel avec des effets aléatoires de densité gamma. Même si l'hypothèse d'une connaissance a priori des paramètres de dispersion semble trop forte, une analyse de sensibilité basée sur la qualité de l'ajustement permet de démontrer la robustesse de notre méthode. Troisièmement, dans le dernier chapitre, nous nous intéressons à la définition et à la construction d'une mesure de concordance donc de corrélation pour les données augmentées en zéro par la modélisation de copules gaussiennes. Contrairement au tau de Kendall dont les valeurs se situent dans un intervalle dont les bornes varient selon la fréquence d'observations d'égalité entre les paires, cette mesure a pour avantage de prendre ses valeurs sur (-1;1). Initialement introduite pour modéliser les corrélations entre des variables continues, son extension au cas discret implique certaines restrictions. En effet, la nouvelle mesure pourrait être interprétée comme la corrélation entre les variables aléatoires continues dont la discrétisation constitue nos observations discrètes non négatives. Deux méthodes d'estimation des modèles augmentés en zéro seront présentées dans les contextes fréquentiste et bayésien basées respectivement sur le maximum de vraisemblance et l'intégration de Gauss-Hermite. Enfin, une étude de simulation permet de montrer la robustesse et les limites de notre approche.
Resumo:
There is a recent trend to describe physical phenomena without the use of infinitesimals or infinites. This has been accomplished replacing differential calculus by the finite difference theory. Discrete function theory was first introduced in l94l. This theory is concerned with a study of functions defined on a discrete set of points in the complex plane. The theory was extensively developed for functions defined on a Gaussian lattice. In 1972 a very suitable lattice H: {Ci qmxO,I qnyo), X0) 0, X3) 0, O < q < l, m, n 5 Z} was found and discrete analytic function theory was developed. Very recently some work has been done in discrete monodiffric function theory for functions defined on H. The theory of pseudoanalytic functions is a generalisation of the theory of analytic functions. When the generator becomes the identity, ie., (l, i) the theory of pseudoanalytic functions reduces to the theory of analytic functions. Theugh the theory of pseudoanalytic functions plays an important role in analysis, no discrete theory is available in literature. This thesis is an attempt in that direction. A discrete pseudoanalytic theory is derived for functions defined on H.
Resumo:
Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.
Resumo:
Consideration is given to a standard CDMA system and determination of the density function of the interference with and without Gaussian noise using sampling theory concepts. The formula derived provides fast and accurate results and is a simple, useful alternative to other methods
Resumo:
Weeds tend to aggregate in patches within fields and there is evidence that this is partly owing to variation in soil properties. Because the processes driving soil heterogeneity operate at different scales, the strength of the relationships between soil properties and weed density would also be expected to be scale-dependent. Quantifying these effects of scale on weed patch dynamics is essential to guide the design of discrete sampling protocols for mapping weed distribution. We have developed a general method that uses novel within-field nested sampling and residual maximum likelihood (REML) estimation to explore scale-dependent relationships between weeds and soil properties. We have validated the method using a case study of Alopecurus myosuroides in winter wheat. Using REML, we partitioned the variance and covariance into scale-specific components and estimated the correlations between the weed counts and soil properties at each scale. We used variograms to quantify the spatial structure in the data and to map variables by kriging. Our methodology successfully captured the effect of scale on a number of edaphic drivers of weed patchiness. The overall Pearson correlations between A. myosuroides and soil organic matter and clay content were weak and masked the stronger correlations at >50 m. Knowing how the variance was partitioned across the spatial scales we optimized the sampling design to focus sampling effort at those scales that contributed most to the total variance. The methods have the potential to guide patch spraying of weeds by identifying areas of the field that are vulnerable to weed establishment.
Resumo:
We consider a class of sampling-based decomposition methods to solve risk-averse multistage stochastic convex programs. We prove a formula for the computation of the cuts necessary to build the outer linearizations of the recourse functions. This formula can be used to obtain an efficient implementation of Stochastic Dual Dynamic Programming applied to convex nonlinear problems. We prove the almost sure convergence of these decomposition methods when the relatively complete recourse assumption holds. We also prove the almost sure convergence of these algorithms when applied to risk-averse multistage stochastic linear programs that do not satisfy the relatively complete recourse assumption. The analysis is first done assuming the underlying stochastic process is interstage independent and discrete, with a finite set of possible realizations at each stage. We then indicate two ways of extending the methods and convergence analysis to the case when the process is interstage dependent.
Resumo:
A computer-based sliding mode control (SMC) is analysed. The control law is accomplished using a computer and A/D and D/A converters. Two SMC designs are presented. The first one is a continuous-time conventional SMC design, with a variable structure law, which does not take into consideration the sampling period. The second one is a discrete-time SMC design, with a smooth sliding law, which does not have a structure variable and takes into consideration the sampling period. Both techniques are applied to control an inverted pendulum system. The performance of both the continuous-time and discrete-time controllers are compared. Simulations and experimental results are shown and the effectiveness of the proposed techniques is analysed.
Resumo:
The spectral principle of Connes and Chamseddine is used as a starting point to define a discrete model for Euclidean quantum gravity. Instead of summing over ordinary geometries, we consider the sum over generalized geometries where topology, metric, and dimension can fluctuate. The model describes the geometry of spaces with a countable number n of points, and is related to the Gaussian unitary ensemble of Hermitian matrices. We show that this simple model has two phases. The expectation value
Resumo:
Persistent Topology is an innovative way of matching topology and geometry, and it proves to be an effective mathematical tool in shape analysis. In order to express its full potential for applications, it has to interface with the typical environment of Computer Science: It must be possible to deal with a finite sampling of the object of interest, and with combinatorial representations of it. Following that idea, the main result claims that it is possible to construct a relation between the persistent Betti numbers (PBNs; also called rank invariant) of a compact, Riemannian submanifold X of R^m and the ones of an approximation U of X itself, where U is generated by a ball covering centered in the points of the sampling. Moreover we can state a further result in which, this time, we relate X with a finite simplicial complex S generated, thanks to a particular construction, by the sampling points. To be more precise, strict inequalities hold only in "blind strips'', i.e narrow areas around the discontinuity sets of the PBNs of U (or S). Out of the blind strips, the values of the PBNs of the original object, of the ball covering of it, and of the simplicial complex coincide, respectively.
Resumo:
Geostatistics involves the fitting of spatially continuous models to spatially discrete data (Chil`es and Delfiner, 1999). Preferential sampling arises when the process that determines the data-locations and the process being modelled are stochastically dependent. Conventional geostatistical methods assume, if only implicitly, that sampling is non-preferential. However, these methods are often used in situations where sampling is likely to be preferential. For example, in mineral exploration samples may be concentrated in areas thought likely to yield high-grade ore. We give a general expression for the likelihood function of preferentially sampled geostatistical data and describe how this can be evaluated approximately using Monte Carlo methods. We present a model for preferential sampling, and demonstrate through simulated examples that ignoring preferential sampling can lead to seriously misleading inferences. We describe an application of the model to a set of bio-monitoring data from Galicia, northern Spain, in which making allowance for preferential sampling materially changes the inferences.
Resumo:
Purpose: Development of an interpolation algorithm for re‐sampling spatially distributed CT‐data with the following features: global and local integral conservation, avoidance of negative interpolation values for positively defined datasets and the ability to control re‐sampling artifacts. Method and Materials: The interpolation can be separated into two steps: first, the discrete CT‐data has to be continuously distributed by an analytic function considering the boundary conditions. Generally, this function is determined by piecewise interpolation. Instead of using linear or high order polynomialinterpolations, which do not fulfill all the above mentioned features, a special form of Hermitian curve interpolation is used to solve the interpolation problem with respect to the required boundary conditions. A single parameter is determined, by which the behavior of the interpolation function is controlled. Second, the interpolated data have to be re‐distributed with respect to the requested grid. Results: The new algorithm was compared with commonly used interpolation functions based on linear and second order polynomial. It is demonstrated that these interpolation functions may over‐ or underestimate the source data by about 10%–20% while the parameter of the new algorithm can be adjusted in order to significantly reduce these interpolation errors. Finally, the performance and accuracy of the algorithm was tested by re‐gridding a series of X‐ray CT‐images. Conclusion: Inaccurate sampling values may occur due to the lack of integral conservation. Re‐sampling algorithms using high order polynomialinterpolation functions may result in significant artifacts of the re‐sampled data. Such artifacts can be avoided by using the new algorithm based on Hermitian curve interpolation
Resumo:
Multi-objective optimization algorithms aim at finding Pareto-optimal solutions. Recovering Pareto fronts or Pareto sets from a limited number of function evaluations are challenging problems. A popular approach in the case of expensive-to-evaluate functions is to appeal to metamodels. Kriging has been shown efficient as a base for sequential multi-objective optimization, notably through infill sampling criteria balancing exploitation and exploration such as the Expected Hypervolume Improvement. Here we consider Kriging metamodels not only for selecting new points, but as a tool for estimating the whole Pareto front and quantifying how much uncertainty remains on it at any stage of Kriging-based multi-objective optimization algorithms. Our approach relies on the Gaussian process interpretation of Kriging, and bases upon conditional simulations. Using concepts from random set theory, we propose to adapt the Vorob’ev expectation and deviation to capture the variability of the set of non-dominated points. Numerical experiments illustrate the potential of the proposed workflow, and it is shown on examples how Gaussian process simulations and the estimated Vorob’ev deviation can be used to monitor the ability of Kriging-based multi-objective optimization algorithms to accurately learn the Pareto front.
Resumo:
We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.
Resumo:
The surroundings of the Cortiou sewage are among the most polluted environments of the French Mediterranean Sea (Marseilles, France). So far, no studies have precisely quantified the impact of pollution on the development of organisms in this area.Methods: We used a fluctuating asymmetry (FA) measure of developmental instability (DI) to assess environmental stress in two species of radially symmetric sea urchins (Arbacia lixula and Paracentrotus lividus). For six sampling sites (Cortiou, Riou, Maire, East Maire, Mejean, and Niolon), levels of FA were calculated from continuous and discrete skeletal measures of ambulacral length, number of pore pairs and primary tubercles.Results: For both species, the most polluted sampling site, Cortiou, displayed the highest level of FA, while the Maire and East Maire sampling sites displayed the lowest levels. A. lixula revealed systematic differences in FA among sampling sites for all characters and P. lividus showed differences in FA for the number of primary tubercles.Conclusions: Statistical analyses of FA show a concordance between the spatial patterns of FA among sampling sites and the spatial distribution of sewage discharge pollutants in the Cortiou area. High developmental stress in these sampling sites is associated with exposure to high concentrations of heavy metals and many harmful organic substances contained in wastewater. FA estimated from structures with complex symmetry appears to be a fast and reliable tool to detect subtle differences in FA. Its use in biomonitoring programs for inferring anthropogenic and natural environmental stress is suggested.
Resumo:
The Tara Oceans Expedition (2009-2013) sampled the world oceans on board a 36 m long schooner, collecting environmental data and organisms from viruses to planktonic metazoans for later analyses using modern sequencing and state-of-the-art imaging technologies. Tara Oceans Data are particularly suited to study the genetic, morphological and functional diversity of plankton. The present data set provides continuous pH measurements made during 2013 expedition with a Satlantic SeaFET instrument that was connected to the flowthrough system. Data calibration was performed according to Bresnahan et al. (2014) (using spectrophotometric pH measurements on discrete samples (Clayton and Byrne 1993). pH_internal values were taken to calibrate the data (rather than pH_external) because of the better calibration coefficient (there was no trend associated with it). The equations of Clayton and Byrne (1993) was used to compute pH from the measured absorbance values at the temperature of measurement. The data was converted to in situ temperature using the "CO2-sys" program which can be downloaded from http://cdiac.ornl.gov/ftp/co2sys/.