971 resultados para kernel density estimation
Resumo:
Peer reviewed
Resumo:
Archaeozoological mortality profiles have been used to infer site-specific subsistence strategies. There is however no common agreement on the best way to present these profiles and confidence intervals around age class proportions. In order to deal with these issues, we propose the use of the Dirichlet distribution and present a new approach to perform age-at-death multivariate graphical comparisons. We demonstrate the efficiency of this approach using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula. We show that the Dirichlet distribution in age-at-death analysis can be used: (i) to generate Bayesian credible intervals around each age class of a mortality profile, even when not all age classes are observed; and (ii) to create 95% kernel density contours around each age-at-death frequency distribution when multiple sites are compared using correspondence analysis. The statistical procedure we present is applicable to the analysis of any categorical count data and particularly well-suited to archaeological data (e.g. potsherds, arrow heads) where sample sizes are typically small.
Resumo:
L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.
Resumo:
The bubble crab Dotilla fenestrata forms very dense populations on the sand flats of the eastern coast of Inhaca Island, Mozambique, making it an interesting biological model to examine spatial distribution patterns and test the relative efficiency of common sampling methods. Due to its apparent ecological importance within the sandy intertidal community, understanding the factors ruling the dynamics of Dotilla populations is also a key issue. In this study, different techniques of estimating crab density are described, and the trends of spatial distribution of the different population categories are shown. The studied populations are arranged in discrete patches located at the well-drained crests of nearly parallel mega sand ripples. For a given sample size, there was an obvious gain in precision by using a stratified random sampling technique, considering discrete patches as strata, compared to the simple random design. Density average and variance differed considerably among patches since juveniles and ovigerous females were found clumped, with higher densities at the lower and upper shore levels, respectively. Burrow counting was found to be an adequate method for large-scale sampling, although consistently underestimating actual crab density by nearly half. Regression analyses suggested that crabs smaller than 2.9 mm carapace width tend to be undetected in visual burrow counts. A visual survey of sampling plots over several patches of a large Dotilla population showed that crab density varied in an interesting oscillating pattern, apparently following the topography of the sand flat. Patches extending to the lower shore contained higher densities than those mostly covering the higher shore. Within-patch density variability also pointed to the same trend, but the density increment towards the lowest shore level varied greatly among the patches compared.
Resumo:
O objetivo deste estudo é caracterizar pela primeira vez alguns aspectos da reprodução do caranguejo-uçá em manguezais da Baía da Babitonga (Santa Catarina). Além disso, a densidade e o tamanho do estoque deste recurso pesqueiro foram também estimados. Os exemplares foram coletados mensalmente, de maio de 2002 a abril de 2003, em duas áreas distintas: Iperoba e Palmital; um total de 2265 espécimes (1623 machos e 642 fêmeas) foi analisado. Os machos com gônadas maturas foram registrados durante todo o ano, enquanto as fêmeas com gônadas maturas ocorreram em apenas cinco meses. As fêmeas ovígeras foram registradas apenas em dezembro e janeiro. O etograma do fenômeno de migração reprodutiva (andada) esteve em concordância com a maior atividade de caranguejos associada às luas cheias e novas, com maior intensidade em dezembro e janeiro, relacionados ao verão austral. A densidade total no Manguezal de Iperoba foi de 2,05 ± 0,97 ind./m², não diferindo significativamente daquela registrada para o Manguezal do Palmital (2,06 ± 1,08 ind./m²) (p < 0,05). A média global para a estimativa de densidade na Baia da Babitonga foi de 2,05 ± 1,00 ind./m², correspondendo a 1,42 ± 0,89 ind./m² com base nas galerias abertas e 0,64 ± 0,63 ind./m² para as galerias fechadas.
Resumo:
Reports of triatomine infestation in urban areas have increased. We analysed the spatial distribution of infestation by triatomines in the urban area of Diamantina, in the state of Minas Gerais, Brazil. Triatomines were obtained by community-based entomological surveillance. Spatial patterns of infestation were analysed by Ripley’s K function and Kernel density estimator. Normalised difference vegetation index (NDVI) and land cover derived from satellite imagery were compared between infested and uninfested areas. A total of 140 adults of four species were captured (100 Triatoma vitticeps, 25 Panstrongylus geniculatus, 8 Panstrongylus megistus, and 7 Triatoma arthurneivai specimens). In total, 87.9% were captured within domiciles. Infection by trypanosomes was observed in 19.6% of 107 examined insects. The spatial distributions of T. vitticeps, P. geniculatus, T. arthurneivai, and trypanosome-positive triatomines were clustered, occurring mainly in peripheral areas. NDVI values were statistically higher in areas infested by T. vitticeps and P. geniculatus. Buildings infested by these species were located closer to open fields, whereas infestations of P. megistus and T. arthurneivai were closer to bare soil. Human occupation and modification of natural areas may be involved in triatomine invasion, exposing the population to these vectors.
Resumo:
Apresenta·se um breve resumo histórico da evolução da amostragem por transectos lineares e desenvolve·se a sua teoria. Descrevemos a teoria de amostragem por transectos lineares, proposta por Buckland (1992), sendo apresentados os pontos mais relevantes, no que diz respeito à modelação da função de detecção. Apresentamos uma descrição do princípio CDM (Rissanen, 1978) e a sua aplicação à estimação de uma função densidade por um histograma (Kontkanen e Myllymãki, 2006), procedendo à aplicação de um exemplo prático, recorrendo a uma mistura de densidades. Procedemos à sua aplicação ao cálculo do estimador da probabilidade de detecção, no caso dos transectos lineares e desta forma estimar a densidade populacional de animais. Analisamos dois casos práticos, clássicos na amostragem por distâncias, comparando os resultados obtidos. De forma a avaliar a metodologia, simulámos vários conjuntos de observações, tendo como base o exemplo das estacas, recorrendo às funções de detecção semi-normal, taxa de risco, exponencial e uniforme com um cosseno. Os resultados foram obtidos com o programa DISTANCE (Thomas et al., in press) e um algoritmo escrito em linguagem C, cedido pelo Professor Doutor Petri Kontkanen (Departamento de Ciências da Computação, Universidade de Helsínquia). Foram desenvolvidos programas de forma a calcular intervalos de confiança recorrendo à técnica bootstrap (Efron, 1978). São discutidos os resultados finais e apresentadas sugestões de desenvolvimentos futuros. ABSTRACT; We present a brief historical note on the evolution of line transect sampling and its theoretical developments. We describe line transect sampling theory as proposed by Buckland (1992), and present the most relevant issues about modeling the detection function. We present a description of the CDM principle (Rissanen, 1978) and its application to histogram density estimation (Kontkanen and Myllymãki, 2006), with a practical example, using a mixture of densities. We proceed with the application and estimate probability of detection and animal population density in the context of line transect sampling. Two classical examples from the literature are analyzed and compared. ln order to evaluate the proposed methodology, we carry out a simulation study based on a wooden stakes example, and using as detection functions half normal, hazard rate, exponential and uniform with a cosine term. The results were obtained using program DISTANCE (Thomas et al., in press), and an algorithm written in C language, kindly offered by Professor Petri Kontkanen (Department of Computer Science, University of Helsinki). We develop some programs in order to estimate confidence intervals using the bootstrap technique (Efron, 1978). Finally, the results are presented and discussed with suggestions for future developments.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
Nowadays, technological advancements have brought industry and research towards the automation of various processes. Automation brings a reduction in costs and an improvement in product quality. For this reason, companies are pushing research to investigate new technologies. The agriculture industry has always looked towards automating various processes, from product processing to storage. In the last years, the automation of harvest and cultivation phases also has become attractive, pushed by the advancement of autonomous driving. Nevertheless, ADAS systems are not enough. Merging different technologies will be the solution to obtain total automation of agriculture processes. For example, sensors that estimate products' physical and chemical properties can be used to evaluate the maturation level of fruit. Therefore, the fusion of these technologies has a key role in industrial process automation. In this dissertation, ADAS systems and sensors for precision agriculture will be both treated. Several measurement procedures for characterizing commercial 3D LiDARs will be proposed and tested to cope with the growing need for comparison tools. Axial errors and transversal errors have been investigated. Moreover, a measurement method and setup for evaluating the fog effect on 3D LiDARs will be proposed. Each presented measurement procedure has been tested. The obtained results highlight the versatility and the goodness of the proposed approaches. Regarding the precision agriculture sensors, a measurement approach for the Moisture Content and density estimation of crop directly on the field is presented. The approach regards the employment of a Near Infrared spectrometer jointly with Partial Least Square statistical analysis. The approach and the model will be described together with a first laboratory prototype used to evaluate the NIRS approach. Finally, a prototype for on the field analysis is realized and tested. The test results are promising, evidencing that the proposed approach is suitable for Moisture Content and density estimation.
Resumo:
A new sparse kernel probability density function (pdf) estimator based on zero-norm constraint is constructed using the classical Parzen window (PW) estimate as the target function. The so-called zero-norm of the parameters is used in order to achieve enhanced model sparsity, and it is suggested to minimize an approximate function of the zero-norm. It is shown that under certain condition, the kernel weights of the proposed pdf estimator based on the zero-norm approximation can be updated using the multiplicative nonnegative quadratic programming algorithm. Numerical examples are employed to demonstrate the efficacy of the proposed approach.
Resumo:
We investigate the interplay of smoothness and monotonicity assumptions when estimating a density from a sample of observations. The nonparametric maximum likelihood estimator of a decreasing density on the positive half line attains a rate of convergence at a fixed point if the density has a negative derivative. The same rate is obtained by a kernel estimator, but the limit distributions are different. If the density is both differentiable and known to be monotone, then a third estimator is obtained by isotonization of a kernel estimator. We show that this again attains the rate of convergence and compare the limit distributors of the three types of estimators. It is shown that both isotonization and smoothing lead to a more concentrated limit distribution and we study the dependence on the proportionality constant in the bandwidth. We also show that isotonization does not change the limit behavior of a kernel estimator with a larger bandwidth, in the case that the density is known to have more than one derivative.
Resumo:
2000 Mathematics Subject Classification: 62G07, 60F10.
Resumo:
Asymmetric discrete triangular distributions are introduced in order to extend the symmetric ones serving for discrete associated kernels in the nonparametric estimation for discrete functions. The extension from one to two orders around the mode provides a large family of discrete distributions having a finite support. Establishing a bridge between Dirac and discrete uniform distributions, some different shapes are also obtained and their properties are investigated. In particular, the mean and variance are pointed out. Applications to discrete kernel estimators are given with a solution to a boundary bias problem. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Crop rotation in center-pivot for phytonematode control: density variation, pathogenicity and crop loss estimation A field study conducted over three consecutive years, on a farm using crop rotation system under center-pivot and infested with the nematodes Pratylenchus brachyurus, P. zeae, Meloidogyne incognita, Paratrichodorus minor, Helicotylenchus dihystera, Mesocriconema ornata and M. onoense, demonstrated that intensive crop systems provide conditions for the maintenance of high densities of polyphagous phytonematodes. Of the crops established on the farm (cotton, maize, soybean and cowpea), cotton and soybean suffered the most severe crop losses, caused respectively by M. incognita and P. brachyurus. Since maize is a good host for both nematodes, but tolerant of M. incognita, its exclusion from cropping system would be favorable to the performance of cotton, soybean and cowpea. Results from experiments carried out in controlled conditions confirmed the pathogenicity of P. brachyurus on cotton. Additional management with genetic resistance was useful in fields infested with M. incognita, although the soybean performance was affected by low resistance of the cultivars used for P. brachyurus. In conclusion, crop rotation must be carefully planned in areas infested with polyphagous nematodes, specifically in the case of occurrence of two or more major pathogenic nematodes.
Resumo:
Dendritic cells (DC) are considered to be the major cell type responsible for induction of primary immune responses. While they have been shown to play a critical role in eliciting allosensitization via the direct pathway, there is evidence that maturational and/or activational heterogeneity between DC in different donor organs may be crucial to allograft outcome. Despite such an important perceived role for DC, no accurate estimates of their number in commonly transplanted organs have been reported. Therefore, leukocytes and DC were visualized and enumerated in cryostat sections of normal mouse (C57BL/10, B10.BR, C3H) liver, heart, kidney and pancreas by immunohistochemistry (CD45 and MHC class II staining, respectively). Total immunopositive cell number and MHC class II+ cell density (C57BL/10 mice only) were estimated using established morphometric techniques - the fractionator and disector principles, respectively. Liver contained considerably more leukocytes (similar to 5-20 x 10(6)) and DC (similar to 1-3 x 10(6)) than the other organs examined (pancreas: similar to 0.6 x 10(6) and similar to 0.35 x 10(6): heart: similar to 0.8 x 10(6) and similar to 0.4 x 10(6); kidney similar to 1.2 x 10(6) and 0.65 x 10(6), respectively). In liver, DC comprised a lower proportion of all leukocytes (similar to 15-25%) than in the other parenchymal organs examined (similar to 40-60%). Comparatively, DC density in C57BL/10 mice was heart > kidney > pancreas much greater than liver (similar to 6.6 x 10(6), 5 x 10(6), 4.5 x 10(6) and 1.1 x 10(6) cells/cm(3), respectively). When compared to previously published data on allograft survival, the results indicate that the absolute number of MHC class II+ DC present in a donor organ is a poor predictor of graft outcome. Survival of solid organ allografts is more closely related to the density of the donor DC network within the graft. (C) 2000 Elsevier Science B.V. All rights reserved.