893 resultados para Kernel Density
Resumo:
In this paper, we study several tests for the equality of two unknown distributions. Two are based on empirical distribution functions, three others on nonparametric probability density estimates, and the last ones on differences between sample moments. We suggest controlling the size of such tests (under nonparametric assumptions) by using permutational versions of the tests jointly with the method of Monte Carlo tests properly adjusted to deal with discrete distributions. We also propose a combined test procedure, whose level is again perfectly controlled through the Monte Carlo test technique and has better power properties than the individual tests that are combined. Finally, in a simulation experiment, we show that the technique suggested provides perfect control of test size and that the new tests proposed can yield sizeable power improvements.
Resumo:
A problem in the archaeometric classification of Catalan Renaissance pottery is the fact, that the clay supply of the pottery workshops was centrally organized by guilds, and therefore usually all potters of a single production centre produced chemically similar ceramics. However, analysing the glazes of the ware usually a large number of inclusions in the glaze is found, which reveal technological differences between single workshops. These inclusions have been used by the potters in order to opacify the transparent glaze and to achieve a white background for further decoration. In order to distinguish different technological preparation procedures of the single workshops, at a Scanning Electron Microscope the chemical composition of those inclusions as well as their size in the two-dimensional cut is recorded. Based on the latter, a frequency distribution of the apparent diameters is estimated for each sample and type of inclusion. Following an approach by S.D. Wicksell (1925), it is principally possible to transform the distributions of the apparent 2D-diameters back to those of the true three-dimensional bodies. The applicability of this approach and its practical problems are examined using different ways of kernel density estimation and Monte-Carlo tests of the methodology. Finally, it is tested in how far the obtained frequency distributions can be used to classify the pottery
Resumo:
In this paper a colour texture segmentation method, which unifies region and boundary information, is proposed. The algorithm uses a coarse detection of the perceptual (colour and texture) edges of the image to adequately place and initialise a set of active regions. Colour texture of regions is modelled by the conjunction of non-parametric techniques of kernel density estimation (which allow to estimate the colour behaviour) and classical co-occurrence matrix based texture features. Therefore, region information is defined and accurate boundary information can be extracted to guide the segmentation process. Regions concurrently compete for the image pixels in order to segment the whole image taking both information sources into account. Furthermore, experimental results are shown which prove the performance of the proposed method
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
We generalize the popular ensemble Kalman filter to an ensemble transform filter, in which the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel density estimator. The design of the filter is based on a continuous formulation of the Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian-mixture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian dynamics in one dimension, Langevin dynamics in two dimensions and the three-dimensional Lorenz-63 model). It is demonstrated that the EGMF is capable of tracking systems with non-Gaussian uni- and multimodal ensemble distributions. Copyright © 2011 Royal Meteorological Society
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
We report a morphology-based approach for the automatic identification of outlier neurons, as well as its application to the NeuroMorpho.org database, with more than 5,000 neurons. Each neuron in a given analysis is represented by a feature vector composed of 20 measurements, which are then projected into a two-dimensional space by applying principal component analysis. Bivariate kernel density estimation is then used to obtain the probability distribution for the group of cells, so that the cells with highest probabilities are understood as archetypes while those with the smallest probabilities are classified as outliers. The potential of the methodology is illustrated in several cases involving uniform cell types as well as cell types for specific animal species. The results provide insights regarding the distribution of cells, yielding single and multi-variate clusters, and they suggest that outlier cells tend to be more planar and tortuous. The proposed methodology can be used in several situations involving one or more categories of cells, as well as for detection of new categories and possible artifacts.
Resumo:
China is a large country characterized by remarkable growth and distinct regional diversity. Spatial disparity has always been a hot issue since China has been struggling to follow a balanced growth path but still confronting with unprecedented pressures and challenges. To better understand the inequality level benchmarking spatial distributions of Chinese provinces and municipalities and estimate dynamic trajectory of sustainable development in China, I constructed the Composite Index of Regional Development (CIRD) with five sub pillars/dimensions involving Macroeconomic Index (MEI), Science and Innovation Index (SCI), Environmental Sustainability Index (ESI), Human Capital Index (HCI) and Public Facilities Index (PFI), endeavoring to cover various fields of regional socioeconomic development. Ranking reports on the five sub dimensions and aggregated CIRD were provided in order to better measure the developmental degrees of 31 or 30 Chinese provinces and municipalities over 13 years from 1998 to 2010 as the time interval of three “Five-year Plans”. Further empirical applications of this CIRD focused on clustering and convergence estimation, attempting to fill up the gap in quantifying the developmental levels of regional comprehensive socioeconomics and estimating the dynamic convergence trajectory of regional sustainable development in a long run. Four clusters were benchmarked geographically-oriented in the map on the basis of cluster analysis, and club-convergence was observed in the Chinese provinces and municipalities based on stochastic kernel density estimation.
Resumo:
Northern hardwood management was assessed throughout the state of Michigan using data collected on recently harvested stands in 2010 and 2011. Methods of forensic estimation of diameter at breast height were compared and an ideal, localized equation form was selected for use in reconstructing pre-harvest stand structures. Comparisons showed differences in predictive ability among available equation forms which led to substantial financial differences when used to estimate the value of removed timber. Management on all stands was then compared among state, private, and corporate landowners. Comparisons of harvest intensities against a liberal interpretation of a well-established management guideline showed that approximately one third of harvests were conducted in a manner which may imply that the guideline was followed. One third showed higher levels of removals than recommended, and one third of harvests were less intensive than recommended. Multiple management guidelines and postulated objectives were then synthesized into a novel system of harvest taxonomy, against which all harvests were compared. This further comparison showed approximately the same proportions of harvests, while distinguishing sanitation cuts and the future productive potential of harvests cut more intensely than suggested by guidelines. Stand structures are commonly represented using diameter distributions. Parametric and nonparametric techniques for describing diameter distributions were employed on pre-harvest and post-harvest data. A common polynomial regression procedure was found to be highly sensitive to the method of histogram construction which provides the data points for the regression. The discriminative ability of kernel density estimation was substantially different from that of the polynomial regression technique.
Resumo:
Background: Clear examples of ecological speciation exist, often involving divergence in trophic morphology. However, substantial variation also exists in how far the ecological speciation process proceeds, potentially linked to the number of ecological axes, traits, or genes subject to divergent selection. In addition, recent studies highlight how differentiation might occur between the sexes, rather than between populations. We examine variation in trophic morphology in two host-plant ecotypes of walking-stick insects (Timema cristinae), known to have diverged in morphological traits related to crypsis and predator avoidance, and to have reached an intermediate point in the ecological speciation process. Here we test how host plant use, sex, and rearing environment affect variation in trophic morphology in this species using traditional multivariate, novel kernel density based and Bayesian morphometric analyses. Results: Contrary to expectations, we find limited host-associated divergence in mandible shape. Instead, the main predictor of shape variation is sex, with secondary roles of population of origin and rearing environment. Conclusion: Our results show that trophic morphology does not strongly contribute to host-adapted ecotype divergence in T. cristinae and that traits can respond to complex selection regimes by diverging along different intraspecific lines, thereby impeding progress toward speciation.
Resumo:
Nuclear morphometry (NM) uses image analysis to measure features of the cell nucleus which are classified as: bulk properties, shape or form, and DNA distribution. Studies have used these measurements as diagnostic and prognostic indicators of disease with inconclusive results. The distributional properties of these variables have not been systematically investigated although much of the medical data exhibit nonnormal distributions. Measurements are done on several hundred cells per patient so summary measurements reflecting the underlying distribution are needed.^ Distributional characteristics of 34 NM variables from prostate cancer cells were investigated using graphical and analytical techniques. Cells per sample ranged from 52 to 458. A small sample of patients with benign prostatic hyperplasia (BPH), representing non-cancer cells, was used for general comparison with the cancer cells.^ Data transformations such as log, square root and 1/x did not yield normality as measured by the Shapiro-Wilks test for normality. A modulus transformation, used for distributions having abnormal kurtosis values, also did not produce normality.^ Kernel density histograms of the 34 variables exhibited non-normality and 18 variables also exhibited bimodality. A bimodality coefficient was calculated and 3 variables: DNA concentration, shape and elongation, showed the strongest evidence of bimodality and were studied further.^ Two analytical approaches were used to obtain a summary measure for each variable for each patient: cluster analysis to determine significant clusters and a mixture model analysis using a two component model having a Gaussian distribution with equal variances. The mixture component parameters were used to bootstrap the log likelihood ratio to determine the significant number of components, 1 or 2. These summary measures were used as predictors of disease severity in several proportional odds logistic regression models. The disease severity scale had 5 levels and was constructed of 3 components: extracapsulary penetration (ECP), lymph node involvement (LN+) and seminal vesicle involvement (SV+) which represent surrogate measures of prognosis. The summary measures were not strong predictors of disease severity. There was some indication from the mixture model results that there were changes in mean levels and proportions of the components in the lower severity levels. ^
Resumo:
Using data from March Current Population Surveys we find gains from economic growth over the 1990s business cycle (1989-2000) were more equitably distributed than over the 1980s business cycle (1979-1989) using summary inequality measures as well as kernel density estimations. The entire distribution of household size-adjusted income moved upwards in the 1990s with profound improvements for African Americans, single mothers and those living in households receiving welfare. Most gains occurred over the growth period 1993-2000. Improvements in average income and income inequity over the latter period are reminiscent of gains seen in the first three decades after World War II.
Resumo:
Working with subsistence whale hunters, we tagged 19 mostly immature bowhead whales (Balaena mysticetus) with satellite-linked transmitters between May 2006 and September 2008 and documented their movements in the Chukchi Sea from late August through December. From Point Barrow, Alaska, most whales moved west through the Chukchi Sea between 71° and 74° N latitude; nine whales crossed in six to nine days. Three whales returned to Point Barrow for 13 to 33 days, two after traveling 300 km west and one after traveling ~725 km west to Wrangel Island, Russia; two then crossed the Chukchi Sea again while the other was the only whale to travel south along the Alaskan side of the Chukchi Sea. Seven whales spent from one to 21 days near Wrangel Island before moving south to northern Chukotka. Whales spent an average of 59 days following the Chukotka coast southeastward. Kernel density analysis identified Point Barrow, Wrangel Island, and the northern coast of Chukotka as areas of greater use by bowhead whales that might be important for feeding. All whales traveled through a potential petroleum development area at least once. Most whales crossed the development area in less than a week; however, one whale remained there for 30 days.
Resumo:
La evolución de los teléfonos móviles inteligentes, dotados de cámaras digitales, está provocando una creciente demanda de aplicaciones cada vez más complejas que necesitan algoritmos de visión artificial en tiempo real; puesto que el tamaño de las señales de vídeo no hace sino aumentar y en cambio el rendimiento de los procesadores de un solo núcleo se ha estancado, los nuevos algoritmos que se diseñen para visión artificial han de ser paralelos para poder ejecutarse en múltiples procesadores y ser computacionalmente escalables. Una de las clases de procesadores más interesantes en la actualidad se encuentra en las tarjetas gráficas (GPU), que son dispositivos que ofrecen un alto grado de paralelismo, un excelente rendimiento numérico y una creciente versatilidad, lo que los hace interesantes para llevar a cabo computación científica. En esta tesis se exploran dos aplicaciones de visión artificial que revisten una gran complejidad computacional y no pueden ser ejecutadas en tiempo real empleando procesadores tradicionales. En cambio, como se demuestra en esta tesis, la paralelización de las distintas subtareas y su implementación sobre una GPU arrojan los resultados deseados de ejecución con tasas de refresco interactivas. Asimismo, se propone una técnica para la evaluación rápida de funciones de complejidad arbitraria especialmente indicada para su uso en una GPU. En primer lugar se estudia la aplicación de técnicas de síntesis de imágenes virtuales a partir de únicamente dos cámaras lejanas y no paralelas—en contraste con la configuración habitual en TV 3D de cámaras cercanas y paralelas—con información de color y profundidad. Empleando filtros de mediana modificados para la elaboración de un mapa de profundidad virtual y proyecciones inversas, se comprueba que estas técnicas son adecuadas para una libre elección del punto de vista. Además, se demuestra que la codificación de la información de profundidad con respecto a un sistema de referencia global es sumamente perjudicial y debería ser evitada. Por otro lado se propone un sistema de detección de objetos móviles basado en técnicas de estimación de densidad con funciones locales. Este tipo de técnicas es muy adecuada para el modelado de escenas complejas con fondos multimodales, pero ha recibido poco uso debido a su gran complejidad computacional. El sistema propuesto, implementado en tiempo real sobre una GPU, incluye propuestas para la estimación dinámica de los anchos de banda de las funciones locales, actualización selectiva del modelo de fondo, actualización de la posición de las muestras de referencia del modelo de primer plano empleando un filtro de partículas multirregión y selección automática de regiones de interés para reducir el coste computacional. Los resultados, evaluados sobre diversas bases de datos y comparados con otros algoritmos del estado del arte, demuestran la gran versatilidad y calidad de la propuesta. Finalmente se propone un método para la aproximación de funciones arbitrarias empleando funciones continuas lineales a tramos, especialmente indicada para su implementación en una GPU mediante el uso de las unidades de filtraje de texturas, normalmente no utilizadas para cómputo numérico. La propuesta incluye un riguroso análisis matemático del error cometido en la aproximación en función del número de muestras empleadas, así como un método para la obtención de una partición cuasióptima del dominio de la función para minimizar el error. ABSTRACT The evolution of smartphones, all equipped with digital cameras, is driving a growing demand for ever more complex applications that need to rely on real-time computer vision algorithms. However, video signals are only increasing in size, whereas the performance of single-core processors has somewhat stagnated in the past few years. Consequently, new computer vision algorithms will need to be parallel to run on multiple processors and be computationally scalable. One of the most promising classes of processors nowadays can be found in graphics processing units (GPU). These are devices offering a high parallelism degree, excellent numerical performance and increasing versatility, which makes them interesting to run scientific computations. In this thesis, we explore two computer vision applications with a high computational complexity that precludes them from running in real time on traditional uniprocessors. However, we show that by parallelizing subtasks and implementing them on a GPU, both applications attain their goals of running at interactive frame rates. In addition, we propose a technique for fast evaluation of arbitrarily complex functions, specially designed for GPU implementation. First, we explore the application of depth-image–based rendering techniques to the unusual configuration of two convergent, wide baseline cameras, in contrast to the usual configuration used in 3D TV, which are narrow baseline, parallel cameras. By using a backward mapping approach with a depth inpainting scheme based on median filters, we show that these techniques are adequate for free viewpoint video applications. In addition, we show that referring depth information to a global reference system is ill-advised and should be avoided. Then, we propose a background subtraction system based on kernel density estimation techniques. These techniques are very adequate for modelling complex scenes featuring multimodal backgrounds, but have not been so popular due to their huge computational and memory complexity. The proposed system, implemented in real time on a GPU, features novel proposals for dynamic kernel bandwidth estimation for the background model, selective update of the background model, update of the position of reference samples of the foreground model using a multi-region particle filter, and automatic selection of regions of interest to reduce computational cost. The results, evaluated on several databases and compared to other state-of-the-art algorithms, demonstrate the high quality and versatility of our proposal. Finally, we propose a general method for the approximation of arbitrarily complex functions using continuous piecewise linear functions, specially formulated for GPU implementation by leveraging their texture filtering units, normally unused for numerical computation. Our proposal features a rigorous mathematical analysis of the approximation error in function of the number of samples, as well as a method to obtain a suboptimal partition of the domain of the function to minimize approximation error.