941 resultados para Kernel density estimation
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
A set of techniques referred to as circular statistics has been developed for the analysis of directional and orientational data. The unit of measure for such data is angular (usually in either degrees or radians), and the statistical distributions underlying the techniques are characterised by their cyclic nature-for example, angles of 359.9 degrees are considered close to angles of 0 degrees. In this paper, we assert that such approaches can be easily adapted to analyse time-of-day and time-of-week data, and in particular daily cycles in the numbers of incidents reported to the police. We begin the paper by describing circular statistics. We then discuss how these may be modified, and demonstrate the approach with some examples for reported incidents in the Cardiff area of Wales. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
For analysing financial time series two main opposing viewpoints exist, either capital markets are completely stochastic and therefore prices follow a random walk, or they are deterministic and consequently predictable. For each of these views a great variety of tools exist with which it can be tried to confirm the hypotheses. Unfortunately, these methods are not well suited for dealing with data characterised in part by both paradigms. This thesis investigates these two approaches in order to model the behaviour of financial time series. In the deterministic framework methods are used to characterise the dimensionality of embedded financial data. The stochastic approach includes here an estimation of the unconditioned and conditional return distributions using parametric, non- and semi-parametric density estimation techniques. Finally, it will be shown how elements from these two approaches could be combined to achieve a more realistic model for financial time series.
Resumo:
Cardiotocographic data provide physicians information about foetal development and permit to assess conditions such as foetal distress. An incorrect evaluation of the foetal status can be of course very dangerous. To improve interpretation of cardiotocographic recordings, great interest has been dedicated to foetal heart rate variability spectral analysis. It is worth reminding, however, that foetal heart rate is intrinsically an uneven series, so in order to produce an evenly sampled series a zero-order, linear or cubic spline interpolation can be employed. This is not suitable for frequency analyses because interpolation introduces alterations in the foetal heart rate power spectrum. In particular, interpolation process can produce alterations of the power spectral density that, for example, affects the estimation of the sympatho-vagal balance (computed as low-frequency/high-frequency ratio), which represents an important clinical parameter. In order to estimate the frequency spectrum alterations of the foetal heart rate variability signal due to interpolation and cardiotocographic storage rates, in this work, we simulated uneven foetal heart rate series with set characteristics, their evenly spaced versions (with different orders of interpolation and storage rates) and computed the sympatho-vagal balance values by power spectral density. For power spectral density estimation, we chose the Lomb method, as suggested by other authors to study the uneven heart rate series in adults. Summarising, the obtained results show that the evaluation of SVB values on the evenly spaced FHR series provides its overestimation due to the interpolation process and to the storage rate. However, cubic spline interpolation produces more robust and accurate results. © 2010 Elsevier Ltd. All rights reserved.
Resumo:
Marine spatial planning and ecological research call for high-resolution species distribution data. However, those data are still not available for most marine large vertebrates. The dynamic nature of oceanographic processes and the wide-ranging behavior of many marine vertebrates create further difficulties, as distribution data must incorporate both the spatial and temporal dimensions. Cetaceans play an essential role in structuring and maintaining marine ecosystems and face increasing threats from human activities. The Azores holds a high diversity of cetaceans but the information about spatial and temporal patterns of distribution for this marine megafauna group in the region is still very limited. To tackle this issue, we created monthly predictive cetacean distribution maps for spring and summer months, using data collected by the Azores Fisheries Observer Programme between 2004 and 2009. We then combined the individual predictive maps to obtain species richness maps for the same period. Our results reflect a great heterogeneity in distribution among species and within species among different months. This heterogeneity reflects a contrasting influence of oceanographic processes on the distribution of cetacean species. However, some persistent areas of increased species richness could also be identified from our results. We argue that policies aimed at effectively protecting cetaceans and their habitats must include the principle of dynamic ocean management coupled with other area-based management such as marine spatial planning.
Resumo:
Peer reviewed
Resumo:
Peer reviewed
Resumo:
Archaeozoological mortality profiles have been used to infer site-specific subsistence strategies. There is however no common agreement on the best way to present these profiles and confidence intervals around age class proportions. In order to deal with these issues, we propose the use of the Dirichlet distribution and present a new approach to perform age-at-death multivariate graphical comparisons. We demonstrate the efficiency of this approach using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula. We show that the Dirichlet distribution in age-at-death analysis can be used: (i) to generate Bayesian credible intervals around each age class of a mortality profile, even when not all age classes are observed; and (ii) to create 95% kernel density contours around each age-at-death frequency distribution when multiple sites are compared using correspondence analysis. The statistical procedure we present is applicable to the analysis of any categorical count data and particularly well-suited to archaeological data (e.g. potsherds, arrow heads) where sample sizes are typically small.
Resumo:
L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.
Resumo:
The bubble crab Dotilla fenestrata forms very dense populations on the sand flats of the eastern coast of Inhaca Island, Mozambique, making it an interesting biological model to examine spatial distribution patterns and test the relative efficiency of common sampling methods. Due to its apparent ecological importance within the sandy intertidal community, understanding the factors ruling the dynamics of Dotilla populations is also a key issue. In this study, different techniques of estimating crab density are described, and the trends of spatial distribution of the different population categories are shown. The studied populations are arranged in discrete patches located at the well-drained crests of nearly parallel mega sand ripples. For a given sample size, there was an obvious gain in precision by using a stratified random sampling technique, considering discrete patches as strata, compared to the simple random design. Density average and variance differed considerably among patches since juveniles and ovigerous females were found clumped, with higher densities at the lower and upper shore levels, respectively. Burrow counting was found to be an adequate method for large-scale sampling, although consistently underestimating actual crab density by nearly half. Regression analyses suggested that crabs smaller than 2.9 mm carapace width tend to be undetected in visual burrow counts. A visual survey of sampling plots over several patches of a large Dotilla population showed that crab density varied in an interesting oscillating pattern, apparently following the topography of the sand flat. Patches extending to the lower shore contained higher densities than those mostly covering the higher shore. Within-patch density variability also pointed to the same trend, but the density increment towards the lowest shore level varied greatly among the patches compared.
Resumo:
O objetivo deste estudo é caracterizar pela primeira vez alguns aspectos da reprodução do caranguejo-uçá em manguezais da Baía da Babitonga (Santa Catarina). Além disso, a densidade e o tamanho do estoque deste recurso pesqueiro foram também estimados. Os exemplares foram coletados mensalmente, de maio de 2002 a abril de 2003, em duas áreas distintas: Iperoba e Palmital; um total de 2265 espécimes (1623 machos e 642 fêmeas) foi analisado. Os machos com gônadas maturas foram registrados durante todo o ano, enquanto as fêmeas com gônadas maturas ocorreram em apenas cinco meses. As fêmeas ovígeras foram registradas apenas em dezembro e janeiro. O etograma do fenômeno de migração reprodutiva (andada) esteve em concordância com a maior atividade de caranguejos associada às luas cheias e novas, com maior intensidade em dezembro e janeiro, relacionados ao verão austral. A densidade total no Manguezal de Iperoba foi de 2,05 ± 0,97 ind./m², não diferindo significativamente daquela registrada para o Manguezal do Palmital (2,06 ± 1,08 ind./m²) (p < 0,05). A média global para a estimativa de densidade na Baia da Babitonga foi de 2,05 ± 1,00 ind./m², correspondendo a 1,42 ± 0,89 ind./m² com base nas galerias abertas e 0,64 ± 0,63 ind./m² para as galerias fechadas.
Resumo:
Reports of triatomine infestation in urban areas have increased. We analysed the spatial distribution of infestation by triatomines in the urban area of Diamantina, in the state of Minas Gerais, Brazil. Triatomines were obtained by community-based entomological surveillance. Spatial patterns of infestation were analysed by Ripley’s K function and Kernel density estimator. Normalised difference vegetation index (NDVI) and land cover derived from satellite imagery were compared between infested and uninfested areas. A total of 140 adults of four species were captured (100 Triatoma vitticeps, 25 Panstrongylus geniculatus, 8 Panstrongylus megistus, and 7 Triatoma arthurneivai specimens). In total, 87.9% were captured within domiciles. Infection by trypanosomes was observed in 19.6% of 107 examined insects. The spatial distributions of T. vitticeps, P. geniculatus, T. arthurneivai, and trypanosome-positive triatomines were clustered, occurring mainly in peripheral areas. NDVI values were statistically higher in areas infested by T. vitticeps and P. geniculatus. Buildings infested by these species were located closer to open fields, whereas infestations of P. megistus and T. arthurneivai were closer to bare soil. Human occupation and modification of natural areas may be involved in triatomine invasion, exposing the population to these vectors.