106 resultados para kernal density estimation
em University of Queensland eSpace - Australia
Resumo:
We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with excellent properties. The approach is in- spired by the principles of the generalized cross entropy method. The pro- posed density estimation procedure has numerous advantages over the tra- ditional kernel density estimator methods. Firstly, for the first time in the nonparametric literature, the proposed estimator allows for a genuine incor- poration of prior information in the density estimation procedure. Secondly, the approach provides the first data-driven bandwidth selection method that is guaranteed to provide a unique bandwidth for any data. Lastly, simulation examples suggest the proposed approach outperforms the current state of the art in nonparametric density estimation in terms of accuracy and reliability.
Resumo:
Dispersal, or the amount of dispersion between an individual's birthplace and that of its offspring, is of great importance in population biology, behavioural ecology and conservation, however, obtaining direct estimates from field data on natural populations can be problematic. The prickly forest skink, Gnypetoscincus queenslandiae, is a rainforest endemic skink from the wet tropics of Australia. Because of its log-dwelling habits and lack of definite nesting sites, a demographic estimate of dispersal distance is difficult to obtain. Neighbourhood size, defined as 4 piD sigma (2) (where D is the population density and sigma (2) the mean axial squared parent-offspring dispersal rate), dispersal and density were estimated directly and indirectly for this species using mark-recapture and microsatellite data, respectively, on lizards captured at a local geographical scale of 3 ha. Mark-recapture data gave a dispersal rate of 843 m(2)/generation (assuming a generation time of 6.5 years), a time-scaled density of 13 635 individuals * generation/km(2) and, hence, a neighbourhood size of 144 individuals. A genetic method based on the multilocus (10 loci) microsatellite genotypes of individuals and their geographical location indicated that there is a significant isolation by distance pattern, and gave a neighbourhood size of 69 individuals, with a 95% confidence interval between 48 and 184. This translates into a dispersal rate of 404 m(2)/generation when using the mark-recapture density estimation, or an estimate of time-scaled population density of 6520 individuals * generation/km(2) when using the mark-recapture dispersal rate estimate. The relationship between the two categories of neighbourhood size, dispersal and density estimates and reasons for any disparities are discussed.
Resumo:
Izenman and Sommer (1988) used a non-parametric Kernel density estimation technique to fit a seven-component model to the paper thickness of the 1872 Hidalgo stamp issue of Mexico. They observed an apparent conflict when fitting a normal mixture model with three components with unequal variances. This conflict is examined further by investigating the most appropriate number of components when fitting a normal mixture of components with equal variances.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
This paper investigates the performance of EASI algorithm and the proposed EKENS algorithm for linear and nonlinear mixtures. The proposed EKENS algorithm is based on the modified equivariant algorithm and kernel density estimation. Theory and characteristic of both the algorithms are discussed for blind source separation model. The separation structure of nonlinear mixtures is based on a nonlinear stage followed by a linear stage. Simulations with artificial and natural data demonstrate the feasibility and good performance of the proposed EKENS algorithm.
Resumo:
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
Dendritic cells (DC) are considered to be the major cell type responsible for induction of primary immune responses. While they have been shown to play a critical role in eliciting allosensitization via the direct pathway, there is evidence that maturational and/or activational heterogeneity between DC in different donor organs may be crucial to allograft outcome. Despite such an important perceived role for DC, no accurate estimates of their number in commonly transplanted organs have been reported. Therefore, leukocytes and DC were visualized and enumerated in cryostat sections of normal mouse (C57BL/10, B10.BR, C3H) liver, heart, kidney and pancreas by immunohistochemistry (CD45 and MHC class II staining, respectively). Total immunopositive cell number and MHC class II+ cell density (C57BL/10 mice only) were estimated using established morphometric techniques - the fractionator and disector principles, respectively. Liver contained considerably more leukocytes (similar to 5-20 x 10(6)) and DC (similar to 1-3 x 10(6)) than the other organs examined (pancreas: similar to 0.6 x 10(6) and similar to 0.35 x 10(6): heart: similar to 0.8 x 10(6) and similar to 0.4 x 10(6); kidney similar to 1.2 x 10(6) and 0.65 x 10(6), respectively). In liver, DC comprised a lower proportion of all leukocytes (similar to 15-25%) than in the other parenchymal organs examined (similar to 40-60%). Comparatively, DC density in C57BL/10 mice was heart > kidney > pancreas much greater than liver (similar to 6.6 x 10(6), 5 x 10(6), 4.5 x 10(6) and 1.1 x 10(6) cells/cm(3), respectively). When compared to previously published data on allograft survival, the results indicate that the absolute number of MHC class II+ DC present in a donor organ is a poor predictor of graft outcome. Survival of solid organ allografts is more closely related to the density of the donor DC network within the graft. (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
The extent to which density-dependent processes regulate natural populations is the subject of an ongoing debate. We contribute evidence to this debate showing that density-dependent processes influence the population dynamics of the ectoparasite Aponomma hydrosauri (Acari: Ixodidae), a tick species that infests reptiles in Australia. The first piece of evidence comes from an unusually long-term dataset on the distribution of ticks among individual hosts. If density-dependent processes are influencing either host mortality or vital rates of the parasite population, and those distributions can be approximated with negative binomial distributions, then general host-parasite models predict that the aggregation coefficient of the parasite distribution will increase with the average intensity of infections. We fit negative binomial distributions to the frequency distributions of ticks on hosts, and find that the estimated aggregation coefficient k increases with increasing average tick density. This pattern indirectly implies that one or more vital rates of the tick population must be changing with increasing tick density, because mortality rates of the tick's main host, the sleepy lizard, Tiliqua rugosa, are unaffected by changes in tick burdens. Our second piece of evidence is a re-analysis of experimental data on the attachment success of individual ticks to lizard hosts using generalized linear modelling. The probability of successful engorgement decreases with increasing numbers of ticks attached to a host. This is direct evidence of a density-dependent process that could lead to an increase in the aggregation coefficient of tick distributions described earlier. The population-scale increase in the aggregation coefficient is indirect evidence of a density-dependent process or processes sufficiently strong to produce a population-wide pattern, and thus also likely to influence population regulation. The direct observation of a density-dependent process is evidence of at least part of the responsible mechanism.
Resumo:
There has been a resurgence of interest in the mean trace length estimator of Pahl for window sampling of traces. The estimator has been dealt with by Mauldon and Zhang and Einstein in recent publications. The estimator is a very useful one in that it is non-parametric. However, despite some discussion regarding the statistical distribution of the estimator, none of the recent works or the original work by Pahl provide a rigorous basis for the determination a confidence interval for the estimator or a confidence region for the estimator and the corresponding estimator of trace spatial intensity in the sampling window. This paper shows, by consideration of a simplified version of the problem but without loss of generality, that the estimator is in fact the maximum likelihood estimator (MLE) and that it can be considered essentially unbiased. As the MLE, it possesses the least variance of all estimators and confidence intervals or regions should therefore be available through application of classical ML theory. It is shown that valid confidence intervals can in fact be determined. The results of the work and the calculations of the confidence intervals are illustrated by example. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
We describe methods for estimating the parameters of Markovian population processes in continuous time, thus increasing their utility in modelling real biological systems. A general approach, applicable to any finite-state continuous-time Markovian model, is presented, and this is specialised to a computationally more efficient method applicable to a class of models called density-dependent Markov population processes. We illustrate the versatility of both approaches by estimating the parameters of the stochastic SIS logistic model from simulated data. This model is also fitted to data from a population of Bay checkerspot butterfly (Euphydryas editha bayensis), allowing us to assess the viability of this population. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
This paper investigates the performance analysis of separation of mutually independent sources in nonlinear models. The nonlinear mapping constituted by an unsupervised linear mixture is followed by an unknown and invertible nonlinear distortion, are found in many signal processing cases. Generally, blind separation of sources from their nonlinear mixtures is rather difficult. We propose using a kernel density estimator incorporated with equivariant gradient analysis to separate the sources with nonlinear distortion. The kernel density estimator parameters of which are iteratively updated to minimize the output independence expressed as a mutual information criterion. The equivariant gradient algorithm has the form of nonlinear decorrelation to perform the convergence analysis. Experiments are proposed to illustrate these results.