966 resultados para density estimation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years there has been an increased interest in applying non-parametric methods to real-world problems. Significant research has been devoted to Gaussian processes (GPs) due to their increased flexibility when compared with parametric models. These methods use Bayesian learning, which generally leads to analytically intractable posteriors. This thesis proposes a two-step solution to construct a probabilistic approximation to the posterior. In the first step we adapt the Bayesian online learning to GPs: the final approximation to the posterior is the result of propagating the first and second moments of intermediate posteriors obtained by combining a new example with the previous approximation. The propagation of em functional forms is solved by showing the existence of a parametrisation to posterior moments that uses combinations of the kernel function at the training points, transforming the Bayesian online learning of functions into a parametric formulation. The drawback is the prohibitive quadratic scaling of the number of parameters with the size of the data, making the method inapplicable to large datasets. The second step solves the problem of the exploding parameter size and makes GPs applicable to arbitrarily large datasets. The approximation is based on a measure of distance between two GPs, the KL-divergence between GPs. This second approximation is with a constrained GP in which only a small subset of the whole training dataset is used to represent the GP. This subset is called the em Basis Vector, or BV set and the resulting GP is a sparse approximation to the true posterior. As this sparsity is based on the KL-minimisation, it is probabilistic and independent of the way the posterior approximation from the first step is obtained. We combine the sparse approximation with an extension to the Bayesian online algorithm that allows multiple iterations for each input and thus approximating a batch solution. The resulting sparse learning algorithm is a generic one: for different problems we only change the likelihood. The algorithm is applied to a variety of problems and we examine its performance both on more classical regression and classification tasks and to the data-assimilation and a simple density estimation problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For analysing financial time series two main opposing viewpoints exist, either capital markets are completely stochastic and therefore prices follow a random walk, or they are deterministic and consequently predictable. For each of these views a great variety of tools exist with which it can be tried to confirm the hypotheses. Unfortunately, these methods are not well suited for dealing with data characterised in part by both paradigms. This thesis investigates these two approaches in order to model the behaviour of financial time series. In the deterministic framework methods are used to characterise the dimensionality of embedded financial data. The stochastic approach includes here an estimation of the unconditioned and conditional return distributions using parametric, non- and semi-parametric density estimation techniques. Finally, it will be shown how elements from these two approaches could be combined to achieve a more realistic model for financial time series.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cardiotocographic data provide physicians information about foetal development and permit to assess conditions such as foetal distress. An incorrect evaluation of the foetal status can be of course very dangerous. To improve interpretation of cardiotocographic recordings, great interest has been dedicated to foetal heart rate variability spectral analysis. It is worth reminding, however, that foetal heart rate is intrinsically an uneven series, so in order to produce an evenly sampled series a zero-order, linear or cubic spline interpolation can be employed. This is not suitable for frequency analyses because interpolation introduces alterations in the foetal heart rate power spectrum. In particular, interpolation process can produce alterations of the power spectral density that, for example, affects the estimation of the sympatho-vagal balance (computed as low-frequency/high-frequency ratio), which represents an important clinical parameter. In order to estimate the frequency spectrum alterations of the foetal heart rate variability signal due to interpolation and cardiotocographic storage rates, in this work, we simulated uneven foetal heart rate series with set characteristics, their evenly spaced versions (with different orders of interpolation and storage rates) and computed the sympatho-vagal balance values by power spectral density. For power spectral density estimation, we chose the Lomb method, as suggested by other authors to study the uneven heart rate series in adults. Summarising, the obtained results show that the evaluation of SVB values on the evenly spaced FHR series provides its overestimation due to the interpolation process and to the storage rate. However, cubic spline interpolation produces more robust and accurate results. © 2010 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 65C05

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Typical Double Auction (DA) models assume that trading agents are one-way traders. With this limitation, they cannot directly reflect the fact individual traders in financial markets (the most popular application of double auction) choose their trading directions dynamically. To address this issue, we introduce the Bi-directional Double Auction (BDA) market which is populated by two-way traders. Based on experiments under both static and dynamic settings, we find that the allocative efficiency of a static continuous BDA market comes from rational selection of trading directions and is negatively related to the intelligence of trading strategies. Moreover, we introduce Kernel trading strategy designed based on probability density estimation for general DA market. Our experiments show it outperforms some intelligent DA market trading strategies. Copyright © 2013, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically ag spikes and other anomalies in their environmental sensor data streams. The problem presents several challenges: near real-time anomaly detection, absence of labeled data and time-changing data streams. Here, we address this problem using both a statistical parametric approach as well as a non-parametric approach like Kernel Density Estimation (KDE). The main contribution of this thesis is extending the KDE to work more effectively for evolving data streams, particularly in presence of concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with KDE. We have tested this approach on several real world data sets and received positive feedback from our industry collaborator. Some results appearing in this thesis have been presented at ECML PKDD 2015 Doctoral Consortium.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Marine spatial planning and ecological research call for high-resolution species distribution data. However, those data are still not available for most marine large vertebrates. The dynamic nature of oceanographic processes and the wide-ranging behavior of many marine vertebrates create further difficulties, as distribution data must incorporate both the spatial and temporal dimensions. Cetaceans play an essential role in structuring and maintaining marine ecosystems and face increasing threats from human activities. The Azores holds a high diversity of cetaceans but the information about spatial and temporal patterns of distribution for this marine megafauna group in the region is still very limited. To tackle this issue, we created monthly predictive cetacean distribution maps for spring and summer months, using data collected by the Azores Fisheries Observer Programme between 2004 and 2009. We then combined the individual predictive maps to obtain species richness maps for the same period. Our results reflect a great heterogeneity in distribution among species and within species among different months. This heterogeneity reflects a contrasting influence of oceanographic processes on the distribution of cetacean species. However, some persistent areas of increased species richness could also be identified from our results. We argue that policies aimed at effectively protecting cetaceans and their habitats must include the principle of dynamic ocean management coupled with other area-based management such as marine spatial planning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The bubble crab Dotilla fenestrata forms very dense populations on the sand flats of the eastern coast of Inhaca Island, Mozambique, making it an interesting biological model to examine spatial distribution patterns and test the relative efficiency of common sampling methods. Due to its apparent ecological importance within the sandy intertidal community, understanding the factors ruling the dynamics of Dotilla populations is also a key issue. In this study, different techniques of estimating crab density are described, and the trends of spatial distribution of the different population categories are shown. The studied populations are arranged in discrete patches located at the well-drained crests of nearly parallel mega sand ripples. For a given sample size, there was an obvious gain in precision by using a stratified random sampling technique, considering discrete patches as strata, compared to the simple random design. Density average and variance differed considerably among patches since juveniles and ovigerous females were found clumped, with higher densities at the lower and upper shore levels, respectively. Burrow counting was found to be an adequate method for large-scale sampling, although consistently underestimating actual crab density by nearly half. Regression analyses suggested that crabs smaller than 2.9 mm carapace width tend to be undetected in visual burrow counts. A visual survey of sampling plots over several patches of a large Dotilla population showed that crab density varied in an interesting oscillating pattern, apparently following the topography of the sand flat. Patches extending to the lower shore contained higher densities than those mostly covering the higher shore. Within-patch density variability also pointed to the same trend, but the density increment towards the lowest shore level varied greatly among the patches compared.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O objetivo deste estudo é caracterizar pela primeira vez alguns aspectos da reprodução do caranguejo-uçá em manguezais da Baía da Babitonga (Santa Catarina). Além disso, a densidade e o tamanho do estoque deste recurso pesqueiro foram também estimados. Os exemplares foram coletados mensalmente, de maio de 2002 a abril de 2003, em duas áreas distintas: Iperoba e Palmital; um total de 2265 espécimes (1623 machos e 642 fêmeas) foi analisado. Os machos com gônadas maturas foram registrados durante todo o ano, enquanto as fêmeas com gônadas maturas ocorreram em apenas cinco meses. As fêmeas ovígeras foram registradas apenas em dezembro e janeiro. O etograma do fenômeno de migração reprodutiva (andada) esteve em concordância com a maior atividade de caranguejos associada às luas cheias e novas, com maior intensidade em dezembro e janeiro, relacionados ao verão austral. A densidade total no Manguezal de Iperoba foi de 2,05 ± 0,97 ind./m², não diferindo significativamente daquela registrada para o Manguezal do Palmital (2,06 ± 1,08 ind./m²) (p < 0,05). A média global para a estimativa de densidade na Baia da Babitonga foi de 2,05 ± 1,00 ind./m², correspondendo a 1,42 ± 0,89 ind./m² com base nas galerias abertas e 0,64 ± 0,63 ind./m² para as galerias fechadas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Apresenta·se um breve resumo histórico da evolução da amostragem por transectos lineares e desenvolve·se a sua teoria. Descrevemos a teoria de amostragem por transectos lineares, proposta por Buckland (1992), sendo apresentados os pontos mais relevantes, no que diz respeito à modelação da função de detecção. Apresentamos uma descrição do princípio CDM (Rissanen, 1978) e a sua aplicação à estimação de uma função densidade por um histograma (Kontkanen e Myllymãki, 2006), procedendo à aplicação de um exemplo prático, recorrendo a uma mistura de densidades. Procedemos à sua aplicação ao cálculo do estimador da probabilidade de detecção, no caso dos transectos lineares e desta forma estimar a densidade populacional de animais. Analisamos dois casos práticos, clássicos na amostragem por distâncias, comparando os resultados obtidos. De forma a avaliar a metodologia, simulámos vários conjuntos de observações, tendo como base o exemplo das estacas, recorrendo às funções de detecção semi-normal, taxa de risco, exponencial e uniforme com um cosseno. Os resultados foram obtidos com o programa DISTANCE (Thomas et al., in press) e um algoritmo escrito em linguagem C, cedido pelo Professor Doutor Petri Kontkanen (Departamento de Ciências da Computação, Universidade de Helsínquia). Foram desenvolvidos programas de forma a calcular intervalos de confiança recorrendo à técnica bootstrap (Efron, 1978). São discutidos os resultados finais e apresentadas sugestões de desenvolvimentos futuros. ABSTRACT; We present a brief historical note on the evolution of line transect sampling and its theoretical developments. We describe line transect sampling theory as proposed by Buckland (1992), and present the most relevant issues about modeling the detection function. We present a description of the CDM principle (Rissanen, 1978) and its application to histogram density estimation (Kontkanen and Myllymãki, 2006), with a practical example, using a mixture of densities. We proceed with the application and estimate probability of detection and animal population density in the context of line transect sampling. Two classical examples from the literature are analyzed and compared. ln order to evaluate the proposed methodology, we carry out a simulation study based on a wooden stakes example, and using as detection functions half normal, hazard rate, exponential and uniform with a cosine term. The results were obtained using program DISTANCE (Thomas et al., in press), and an algorithm written in C language, kindly offered by Professor Petri Kontkanen (Department of Computer Science, University of Helsinki). We develop some programs in order to estimate confidence intervals using the bootstrap technique (Efron, 1978). Finally, the results are presented and discussed with suggestions for future developments.