984 resultados para kernal density estimation


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We introduce simple nonparametric density estimators that generalize theclassical histogram and frequency polygon. The new estimators are expressed as linear combination of density functions that are piecewisepolynomials, where the coefficients are optimally chosen in order to minimize the integrated square error of the estimator. We establish the asymptotic behaviour of the proposed estimators, and study theirperformance in a simulation study.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the standard kernel density estimate, it is known that one can tune the bandwidth such that the expected L1 error is within a constant factor of the optimal L1 error (obtained when one is allowed to choose the bandwidth with knowledge of the density). In this paper, we pose the same problem for variable bandwidth kernel estimates where the bandwidths are allowed to depend upon the location. We show in particular that for positive kernels on the real line, for any data-based bandwidth, there exists a densityfor which the ratio of expected L1 error over optimal L1 error tends to infinity. Thus, the problem of tuning the variable bandwidth in an optimal manner is ``too hard''. Moreover, from the class of counterexamples exhibited in the paper, it appears thatplacing conditions on the densities (monotonicity, convexity, smoothness) does not help.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A tool for user choice of the local bandwidth function for a kernel density estimate is developed using KDE, a graphical object-oriented package for interactive kernel density estimation written in LISP-STAT. The bandwidth function is a cubic spline, whose knots are manipulated by the user in one window, while the resulting estimate appears in another window. A real data illustration of this method raises concerns, because an extremely large family of estimates is available.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop a general error analysis framework for the Monte Carlo simulationof densities for functionals in Wiener space. We also study variancereduction methods with the help of Malliavin derivatives. For this, wegive some general heuristic principles which are applied to diffusionprocesses. A comparison with kernel density estimates is made.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Let a class $\F$ of densities be given. We draw an i.i.d.\ sample from a density $f$ which may or may not be in $\F$. After every $n$, one must make a guess whether $f \in \F$ or not. A class is almost surely testable if there exists such a testing sequence such that for any $f$, we make finitely many errors almost surely. In this paper, several results are given that allowone to decide whether a class is almost surely testable. For example, continuity and square integrability are not testable, but unimodality, log-concavity, and boundedness by a given constant are.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a new family of density functions that possess both flexibilityand closed form expressions for moments and anti-derivatives, makingthem particularly appealing for applications. We illustrate its usefulnessby applying our new family to obtain density forecasts of U.S. inflation.Our methods generate forecasts that improve on standard methods based on AR-ARCH models relying on normal or Student's t-distributional assumptions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: Cytoskeletal changes after longterm exposure to ethanol have been described in a number of cell types in adult rat and humans. These changes can play a key part in the impairment of nutrient assimilation and postnatal growth retardation after prenatal damage of the intestinal epithelium produced by ethanol intake. AIMS: To determine, in the newborn rat, which cytoskeletal proteins are affected by longterm ethanol exposure in utero and to what extent. ANIMALS: The offspring of two experimental groups of female Wistar rats: ethanol treated group receiving up to 25% (w/v) of ethanol in the drinking fluid and control group receiving water as drinking fluid. METHODS: Single and double electron microscopy immunolocalisation and label density estimation of cytoskeletal proteins on sections of proximal small intestine incubated with monoclonal antibodies against actin, alpha-tubulin, cytokeratin (polypeptides 1, 5, 6, 7, 8, 10, 11, and 18), and with a polyclonal antibody anti-beta 1,4-galactosyl transferase as trans golgi (TG) or trans golgi network (TGN) marker, or both. SDS-PAGE technique was also performed on cytoskeletal enriched fractions from small intestine. Western blotting analysis was carried out by incubation with the same antibodies used for immunolocalisation. RESULTS: Intestinal epithelium of newborn rats from the ethanol treated group showed an overexpression of cytoskeletal polypeptides ranging from 39 to 54 kDa, affecting actin and some cytokeratins, but not tubulin. Furthermore, a cytokeratin related polypeptide of 28-29 kDa was identified together with an increase in free ubiquitin in the same group. It was noteworthy that actin and cytokeratin were abnormally located in the TG or the TGN, or both. CONCLUSIONS: Longterm exposure to ethanol in utero causes severe dysfunction in the cytoskeleton of the developing intestinal epithelium. Actin and cytokeratins, which are involved in cytoskeleton anchoring to plasma membrane and cell adhesion, are particularly affected, showing overexpression, impaired proteolysis, and mislocalisation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: Cytoskeletal changes after longterm exposure to ethanol have been described in a number of cell types in adult rat and humans. These changes can play a key part in the impairment of nutrient assimilation and postnatal growth retardation after prenatal damage of the intestinal epithelium produced by ethanol intake. AIMS: To determine, in the newborn rat, which cytoskeletal proteins are affected by longterm ethanol exposure in utero and to what extent. ANIMALS: The offspring of two experimental groups of female Wistar rats: ethanol treated group receiving up to 25% (w/v) of ethanol in the drinking fluid and control group receiving water as drinking fluid. METHODS: Single and double electron microscopy immunolocalisation and label density estimation of cytoskeletal proteins on sections of proximal small intestine incubated with monoclonal antibodies against actin, alpha-tubulin, cytokeratin (polypeptides 1, 5, 6, 7, 8, 10, 11, and 18), and with a polyclonal antibody anti-beta 1,4-galactosyl transferase as trans golgi (TG) or trans golgi network (TGN) marker, or both. SDS-PAGE technique was also performed on cytoskeletal enriched fractions from small intestine. Western blotting analysis was carried out by incubation with the same antibodies used for immunolocalisation. RESULTS: Intestinal epithelium of newborn rats from the ethanol treated group showed an overexpression of cytoskeletal polypeptides ranging from 39 to 54 kDa, affecting actin and some cytokeratins, but not tubulin. Furthermore, a cytokeratin related polypeptide of 28-29 kDa was identified together with an increase in free ubiquitin in the same group. It was noteworthy that actin and cytokeratin were abnormally located in the TG or the TGN, or both. CONCLUSIONS: Longterm exposure to ethanol in utero causes severe dysfunction in the cytoskeleton of the developing intestinal epithelium. Actin and cytokeratins, which are involved in cytoskeleton anchoring to plasma membrane and cell adhesion, are particularly affected, showing overexpression, impaired proteolysis, and mislocalisation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose an innovative methodology for automated profiling of illicit tablets bytheir surface granularity; a feature previously unexamined for this purpose. We make use of the tinyinconsistencies at the tablet surface, referred to as speckles, to generate a quantitative granularity profileof tablets. Euclidian distance is used as a measurement of (dis)similarity between granularity profiles.The frequency of observed distances is then modelled by kernel density estimation in order to generalizethe observations and to calculate likelihood ratios (LRs). The resulting LRs are used to evaluate thepotential of granularity profiles to differentiate between same-batch and different-batches tablets.Furthermore, we use the LRs as a similarity metric to refine database queries. We are able to derivereliable LRs within a scope that represent the true evidential value of the granularity feature. Thesemetrics are used to refine candidate hit-lists form a database containing physical features of illicittablets. We observe improved or identical ranking of candidate tablets in 87.5% of cases when granularityis considered.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La estimación de la media de la densidad de pulgones en alfalfa basada en conteos de campo se compara con el muestreo presencia-ausencia. Se obtuvieron 21 muestras aleatorias formadas por 75 tallos obtenidos en campos comerciales de alfalfa en Lleida (valle del Ebro) con el objetivo de predecir la estimación de la densidad media de pulgones (û) a partir de la estimación de la proporción de tallos infestados (p). La relación empírica entre û y su varianza muestral utilizando como modelo la ley potencial de Taylor es satisfactoria (r2 = 0,98). La relación empírica entre p y su varianza muestral es prácticamente binomial. Finalmente, la relación empírica entre û y p a partir de la regresión lineal entre ln(û) y ln(-ln p) fue satisfactoria (r2 = 0,94). A partir del muestreo presencia-ausencia es posible estimar densidades medias de pulgones de hasta unos 20 pulgones por tallo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper a colour texture segmentation method, which unifies region and boundary information, is proposed. The algorithm uses a coarse detection of the perceptual (colour and texture) edges of the image to adequately place and initialise a set of active regions. Colour texture of regions is modelled by the conjunction of non-parametric techniques of kernel density estimation (which allow to estimate the colour behaviour) and classical co-occurrence matrix based texture features. Therefore, region information is defined and accurate boundary information can be extracted to guide the segmentation process. Regions concurrently compete for the image pixels in order to segment the whole image taking both information sources into account. Furthermore, experimental results are shown which prove the performance of the proposed method

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tropical forests are sources of many ecosystem services, but these forests are vanishing rapidly. The situation is severe in Sub-Saharan Africa and especially in Tanzania. The causes of change are multidimensional and strongly interdependent, and only understanding them comprehensively helps to change the ongoing unsustainable trends of forest decline. Ongoing forest changes, their spatiality and connection to humans and environment can be studied with the methods of Land Change Science. The knowledge produced with these methods helps to make arguments about the actors, actions and causes that are behind the forest decline. In this study of Unguja Island in Zanzibar the focus is in the current forest cover and its changes between 1996 and 2009. The cover and changes are measured with often used remote sensing methods of automated land cover classification and post-classification comparison from medium resolution satellite images. Kernel Density Estimation is used to determine the clusters of change, sub-area –analysis provides information about the differences between regions, while distance and regression analyses connect changes to environmental factors. These analyses do not only explain the happened changes, but also allow building quantitative and spatial future scenarios. Similar study has not been made for Unguja and therefore it provides new information, which is beneficial for the whole society. The results show that 572 km2 of Unguja is still forested, but 0,82–1,19% of these forests are disappearing annually. Besides deforestation also vertical degradation and spatial changes are significant problems. Deforestation is most severe in the communal indigenous forests, but also agroforests are decreasing. Spatially deforestation concentrates to the areas close to the coastline, population and Zanzibar Town. Biophysical factors on the other hand do not seem to influence the ongoing deforestation process. If the current trend continues there should be approximately 485 km2 of forests remaining in 2025. Solutions to these deforestation problems should be looked from sustainable land use management, surveying and protection of the forests in risk areas and spatially targeted self-sustainable tree planting schemes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.