Biblioteca Digital

938 resultados para Spatial Mixture Models

On modifications to the long-term survival mixture model in the presence of competing risks

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A mixture model for long-term survivors has been adopted in various fields such as biostatistics and criminology where some individuals may never experience the type of failure under study. It is directly applicable in situations where the only information available from follow-up on individuals who will never experience this type of failure is in the form of censored observations. In this paper, we consider a modification to the model so that it still applies in the case where during the follow-up period it becomes known that an individual will never experience failure from the cause of interest. Unless a model allows for this additional information, a consistent survival analysis will not be obtained. A partial maximum likelihood (ML) approach is proposed that preserves the simplicity of the long-term survival mixture model and provides consistent estimators of the quantities of interest. Some simulation experiments are performed to assess the efficiency of the partial ML approach relative to the full ML approach for survival in the presence of competing risks.

Robust Mixture Modelling Using the t Distribution

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

Fitting a mixture model to three-mode three-way data with missing information

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.

An EM-based Semi-Parametric Mixture Model Approach to the Regression Analysis of Competing-Risks Data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.

Role of malnutrition and parasite infections in the spatial variation in children’s anaemia risk in Northern Angola

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Anaemia has a significant impact on child development and mortality and is a severe public health problem in most countries in sub-Saharan Africa. Nutritional and infectious causes of anaemia are geographically variable and anaemia maps based on information on the major aetiologies of anaemia are important for identifying communities most in need and the relative contribution of major causes. We investigated the consistency between ecological and individual-level approaches to anaemia mapping, by building spatial anaemia models for children aged ≤15 years using different modeling approaches. We aimed to a) quantify the role of malnutrition, malaria, Schistosoma haematobium and soil-transmitted helminths (STH) for anaemia endemicity in children aged ≤15 years and b) develop a high resolution predictive risk map of anaemia for the municipality of Dande in Northern Angola. We used parasitological survey data on children aged ≤15 years to build Bayesian geostatistical models of malaria (PfPR≤15), S. haematobium, Ascaris lumbricoides and Trichuris trichiura and predict small-scale spatial variation in these infections. The predictions and their associated uncertainty were used as inputs for a model of anemia prevalence to predict small-scale spatial variation of anaemia. Stunting, PfPR≤15, and S. haematobium infections were significantly associated with anaemia risk. An estimated 12.5%, 15.6%, and 9.8%, of anaemia cases could be averted by treating malnutrition, malaria, S. haematobium, respectively. Spatial clusters of high risk of anaemia (>86%) were identified. Using an individual-level approach to anaemia mapping at a small spatial scale, we found that anaemia in children aged ≤15 years is highly heterogeneous and that malnutrition and parasitic infections are important contributors to the spatial variation in anemia risk. The results presented in this study can help inform the integration of the current provincial malaria control program with ancillary micronutrient supplementation and control of neglected tropical diseases, such as urogenital schistosomiasis and STH infection.

Role of malnutrition and parasite infections in the spatial variation in children’s anaemia risk in northern Angola

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Anaemia is known to have an impact on child development and mortality and is a severe public health problem in most countries in sub-Saharan Africa. We investigated the consistency between ecological and individual-level approaches to anaemia mapping by building spatial anaemia models for children aged ≤15 years using different modelling approaches. We aimed to (i) quantify the role of malnutrition, malaria, Schistosoma haematobium and soil-transmitted helminths (STHs) in anaemia endemicity; and (ii) develop a high resolution predictive risk map of anaemia for the municipality of Dande in northern Angola. We used parasitological survey data for children aged ≤15 years to build Bayesian geostatistical models of malaria (PfPR≤15), S. haematobium, Ascaris lumbricoides and Trichuris trichiura and predict small-scale spatial variations in these infections. Malnutrition, PfPR≤15, and S. haematobium infections were significantly associated with anaemia risk. An estimated 12.5%, 15.6% and 9.8% of anaemia cases could be averted by treating malnutrition, malaria and S. haematobium, respectively. Spatial clusters of high risk of anaemia (>86%) were identified. Using an individual-level approach to anaemia mapping at a small spatial scale, we found that anaemia in children aged ≤15 years is highly heterogeneous and that malnutrition and parasitic infections are important contributors to the spatial variation in anaemia risk. The results presented in this study can help inform the integration of the current provincial malaria control programme with ancillary micronutrient supplementation and control of neglected tropical diseases such as urogenital schistosomiasis and STH infections.

Time Varying Dimension Models

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Time varying parameter (TVP) models have enjoyed an increasing popularity in empirical macroeconomics. However, TVP models are parameter-rich and risk over-fitting unless the dimension of the model is small. Motivated by this worry, this paper proposes several Time Varying dimension (TVD) models where the dimension of the model can change over time, allowing for the model to automatically choose a more parsimonious TVP representation, or to switch between different parsimonious representations. Our TVD models all fall in the category of dynamic mixture models. We discuss the properties of these models and present methods for Bayesian inference. An application involving US inflation forecasting illustrates and compares the different TVD models. We find our TVD approaches exhibit better forecasting performance than several standard benchmarks and shrink towards parsimonious specifications.

Reliability of methods used for the risk assessment of mixture of micropollutants in aquatic enviromments : from theoretical concepts to field observations

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ces dernières années, de nombreuses recherches ont mis en évidence les effets toxiques des micropolluants organiques pour les espèces de nos lacs et rivières. Cependant, la plupart de ces études se sont focalisées sur la toxicité des substances individuelles, alors que les organismes sont exposés tous les jours à des milliers de substances en mélange. Or les effets de ces cocktails ne sont pas négligeables. Cette thèse de doctorat s'est ainsi intéressée aux modèles permettant de prédire le risque environnemental de ces cocktails pour le milieu aquatique. Le principal objectif a été d'évaluer le risque écologique des mélanges de substances chimiques mesurées dans le Léman, mais aussi d'apporter un regard critique sur les méthodologies utilisées afin de proposer certaines adaptations pour une meilleure estimation du risque. Dans la première partie de ce travail, le risque des mélanges de pesticides et médicaments pour le Rhône et pour le Léman a été établi en utilisant des approches envisagées notamment dans la législation européenne. Il s'agit d'approches de « screening », c'est-à-dire permettant une évaluation générale du risque des mélanges. Une telle approche permet de mettre en évidence les substances les plus problématiques, c'est-à-dire contribuant le plus à la toxicité du mélange. Dans notre cas, il s'agit essentiellement de 4 pesticides. L'étude met également en évidence que toutes les substances, même en trace infime, contribuent à l'effet du mélange. Cette constatation a des implications en terme de gestion de l'environnement. En effet, ceci implique qu'il faut réduire toutes les sources de polluants, et pas seulement les plus problématiques. Mais l'approche proposée présente également un biais important au niveau conceptuel, ce qui rend son utilisation discutable, en dehors d'un screening, et nécessiterait une adaptation au niveau des facteurs de sécurité employés. Dans une deuxième partie, l'étude s'est portée sur l'utilisation des modèles de mélanges dans le calcul de risque environnemental. En effet, les modèles de mélanges ont été développés et validés espèce par espèce, et non pour une évaluation sur l'écosystème en entier. Leur utilisation devrait donc passer par un calcul par espèce, ce qui est rarement fait dû au manque de données écotoxicologiques à disposition. Le but a été donc de comparer, avec des valeurs générées aléatoirement, le calcul de risque effectué selon une méthode rigoureuse, espèce par espèce, avec celui effectué classiquement où les modèles sont appliqués sur l'ensemble de la communauté sans tenir compte des variations inter-espèces. Les résultats sont dans la majorité des cas similaires, ce qui valide l'approche utilisée traditionnellement. En revanche, ce travail a permis de déterminer certains cas où l'application classique peut conduire à une sous- ou sur-estimation du risque. Enfin, une dernière partie de cette thèse s'est intéressée à l'influence que les cocktails de micropolluants ont pu avoir sur les communautés in situ. Pour ce faire, une approche en deux temps a été adoptée. Tout d'abord la toxicité de quatorze herbicides détectés dans le Léman a été déterminée. Sur la période étudiée, de 2004 à 2009, cette toxicité due aux herbicides a diminué, passant de 4% d'espèces affectées à moins de 1%. Ensuite, la question était de savoir si cette diminution de toxicité avait un impact sur le développement de certaines espèces au sein de la communauté des algues. Pour ce faire, l'utilisation statistique a permis d'isoler d'autres facteurs pouvant avoir une influence sur la flore, comme la température de l'eau ou la présence de phosphates, et ainsi de constater quelles espèces se sont révélées avoir été influencées, positivement ou négativement, par la diminution de la toxicité dans le lac au fil du temps. Fait intéressant, une partie d'entre-elles avait déjà montré des comportements similaires dans des études en mésocosmes. En conclusion, ce travail montre qu'il existe des modèles robustes pour prédire le risque des mélanges de micropolluants sur les espèces aquatiques, et qu'ils peuvent être utilisés pour expliquer le rôle des substances dans le fonctionnement des écosystèmes. Toutefois, ces modèles ont bien sûr des limites et des hypothèses sous-jacentes qu'il est important de considérer lors de leur application. - Depuis plusieurs années, les risques que posent les micropolluants organiques pour le milieu aquatique préoccupent grandement les scientifiques ainsi que notre société. En effet, de nombreuses recherches ont mis en évidence les effets toxiques que peuvent avoir ces substances chimiques sur les espèces de nos lacs et rivières, quand elles se retrouvent exposées à des concentrations aiguës ou chroniques. Cependant, la plupart de ces études se sont focalisées sur la toxicité des substances individuelles, c'est à dire considérées séparément. Actuellement, il en est de même dans les procédures de régulation européennes, concernant la partie évaluation du risque pour l'environnement d'une substance. Or, les organismes sont exposés tous les jours à des milliers de substances en mélange, et les effets de ces "cocktails" ne sont pas négligeables. L'évaluation du risque écologique que pose ces mélanges de substances doit donc être abordé par de la manière la plus appropriée et la plus fiable possible. Dans la première partie de cette thèse, nous nous sommes intéressés aux méthodes actuellement envisagées à être intégrées dans les législations européennes pour l'évaluation du risque des mélanges pour le milieu aquatique. Ces méthodes sont basées sur le modèle d'addition des concentrations, avec l'utilisation des valeurs de concentrations des substances estimées sans effet dans le milieu (PNEC), ou à partir des valeurs des concentrations d'effet (CE50) sur certaines espèces d'un niveau trophique avec la prise en compte de facteurs de sécurité. Nous avons appliqué ces méthodes à deux cas spécifiques, le lac Léman et le Rhône situés en Suisse, et discuté les résultats de ces applications. Ces premières étapes d'évaluation ont montré que le risque des mélanges pour ces cas d'étude atteint rapidement une valeur au dessus d'un seuil critique. Cette valeur atteinte est généralement due à deux ou trois substances principales. Les procédures proposées permettent donc d'identifier les substances les plus problématiques pour lesquelles des mesures de gestion, telles que la réduction de leur entrée dans le milieu aquatique, devraient être envisagées. Cependant, nous avons également constaté que le niveau de risque associé à ces mélanges de substances n'est pas négligeable, même sans tenir compte de ces substances principales. En effet, l'accumulation des substances, même en traces infimes, atteint un seuil critique, ce qui devient plus difficile en terme de gestion du risque. En outre, nous avons souligné un manque de fiabilité dans ces procédures, qui peuvent conduire à des résultats contradictoires en terme de risque. Ceci est lié à l'incompatibilité des facteurs de sécurité utilisés dans les différentes méthodes. Dans la deuxième partie de la thèse, nous avons étudié la fiabilité de méthodes plus avancées dans la prédiction de l'effet des mélanges pour les communautés évoluant dans le système aquatique. Ces méthodes reposent sur le modèle d'addition des concentrations (CA) ou d'addition des réponses (RA) appliqués sur les courbes de distribution de la sensibilité des espèces (SSD) aux substances. En effet, les modèles de mélanges ont été développés et validés pour être appliqués espèce par espèce, et non pas sur plusieurs espèces agrégées simultanément dans les courbes SSD. Nous avons ainsi proposé une procédure plus rigoureuse, pour l'évaluation du risque d'un mélange, qui serait d'appliquer d'abord les modèles CA ou RA à chaque espèce séparément, et, dans une deuxième étape, combiner les résultats afin d'établir une courbe SSD du mélange. Malheureusement, cette méthode n'est pas applicable dans la plupart des cas, car elle nécessite trop de données généralement indisponibles. Par conséquent, nous avons comparé, avec des valeurs générées aléatoirement, le calcul de risque effectué selon cette méthode plus rigoureuse, avec celle effectuée traditionnellement, afin de caractériser la robustesse de cette approche qui consiste à appliquer les modèles de mélange sur les courbes SSD. Nos résultats ont montré que l'utilisation de CA directement sur les SSDs peut conduire à une sous-estimation de la concentration du mélange affectant 5 % ou 50% des espèces, en particulier lorsque les substances présentent un grand écart- type dans leur distribution de la sensibilité des espèces. L'application du modèle RA peut quant à lui conduire à une sur- ou sous-estimations, principalement en fonction de la pente des courbes dose- réponse de chaque espèce composant les SSDs. La sous-estimation avec RA devient potentiellement importante lorsque le rapport entre la EC50 et la EC10 de la courbe dose-réponse des espèces est plus petit que 100. Toutefois, la plupart des substances, selon des cas réels, présentent des données d' écotoxicité qui font que le risque du mélange calculé par la méthode des modèles appliqués directement sur les SSDs reste cohérent et surestimerait plutôt légèrement le risque. Ces résultats valident ainsi l'approche utilisée traditionnellement. Néanmoins, il faut garder à l'esprit cette source d'erreur lorsqu'on procède à une évaluation du risque d'un mélange avec cette méthode traditionnelle, en particulier quand les SSD présentent une distribution des données en dehors des limites déterminées dans cette étude. Enfin, dans la dernière partie de cette thèse, nous avons confronté des prédictions de l'effet de mélange avec des changements biologiques observés dans l'environnement. Dans cette étude, nous avons utilisé des données venant d'un suivi à long terme d'un grand lac européen, le lac Léman, ce qui offrait la possibilité d'évaluer dans quelle mesure la prédiction de la toxicité des mélanges d'herbicide expliquait les changements dans la composition de la communauté phytoplanctonique. Ceci à côté d'autres paramètres classiques de limnologie tels que les nutriments. Pour atteindre cet objectif, nous avons déterminé la toxicité des mélanges sur plusieurs années de 14 herbicides régulièrement détectés dans le lac, en utilisant les modèles CA et RA avec les courbes de distribution de la sensibilité des espèces. Un gradient temporel de toxicité décroissant a pu être constaté de 2004 à 2009. Une analyse de redondance et de redondance partielle, a montré que ce gradient explique une partie significative de la variation de la composition de la communauté phytoplanctonique, même après avoir enlevé l'effet de toutes les autres co-variables. De plus, certaines espèces révélées pour avoir été influencées, positivement ou négativement, par la diminution de la toxicité dans le lac au fil du temps, ont montré des comportements similaires dans des études en mésocosmes. On peut en conclure que la toxicité du mélange herbicide est l'un des paramètres clés pour expliquer les changements de phytoplancton dans le lac Léman. En conclusion, il existe diverses méthodes pour prédire le risque des mélanges de micropolluants sur les espèces aquatiques et celui-ci peut jouer un rôle dans le fonctionnement des écosystèmes. Toutefois, ces modèles ont bien sûr des limites et des hypothèses sous-jacentes qu'il est important de considérer lors de leur application, avant d'utiliser leurs résultats pour la gestion des risques environnementaux. - For several years now, the scientists as well as the society is concerned by the aquatic risk organic micropollutants may pose. Indeed, several researches have shown the toxic effects these substances may induce on organisms living in our lakes or rivers, especially when they are exposed to acute or chronic concentrations. However, most of the studies focused on the toxicity of single compounds, i.e. considered individually. The same also goes in the current European regulations concerning the risk assessment procedures for the environment of these substances. But aquatic organisms are typically exposed every day simultaneously to thousands of organic compounds. The toxic effects resulting of these "cocktails" cannot be neglected. The ecological risk assessment of mixtures of such compounds has therefore to be addressed by scientists in the most reliable and appropriate way. In the first part of this thesis, the procedures currently envisioned for the aquatic mixture risk assessment in European legislations are described. These methodologies are based on the mixture model of concentration addition and the use of the predicted no effect concentrations (PNEC) or effect concentrations (EC50) with assessment factors. These principal approaches were applied to two specific case studies, Lake Geneva and the River Rhône in Switzerland, including a discussion of the outcomes of such applications. These first level assessments showed that the mixture risks for these studied cases exceeded rapidly the critical value. This exceeding is generally due to two or three main substances. The proposed procedures allow therefore the identification of the most problematic substances for which management measures, such as a reduction of the entrance to the aquatic environment, should be envisioned. However, it was also showed that the risk levels associated with mixtures of compounds are not negligible, even without considering these main substances. Indeed, it is the sum of the substances that is problematic, which is more challenging in term of risk management. Moreover, a lack of reliability in the procedures was highlighted, which can lead to contradictory results in terms of risk. This result is linked to the inconsistency in the assessment factors applied in the different methods. In the second part of the thesis, the reliability of the more advanced procedures to predict the mixture effect to communities in the aquatic system were investigated. These established methodologies combine the model of concentration addition (CA) or response addition (RA) with species sensitivity distribution curves (SSD). Indeed, the mixture effect predictions were shown to be consistent only when the mixture models are applied on a single species, and not on several species simultaneously aggregated to SSDs. Hence, A more stringent procedure for mixture risk assessment is proposed, that would be to apply first the CA or RA models to each species separately and, in a second step, to combine the results to build an SSD for a mixture. Unfortunately, this methodology is not applicable in most cases, because it requires large data sets usually not available. Therefore, the differences between the two methodologies were studied with datasets created artificially to characterize the robustness of the traditional approach applying models on species sensitivity distribution. The results showed that the use of CA on SSD directly might lead to underestimations of the mixture concentration affecting 5% or 50% of species, especially when substances present a large standard deviation of the distribution from the sensitivity of the species. The application of RA can lead to over- or underestimates, depending mainly on the slope of the dose-response curves of the individual species. The potential underestimation with RA becomes important when the ratio between the EC50 and the EC10 for the dose-response curve of the species composing the SSD are smaller than 100. However, considering common real cases of ecotoxicity data for substances, the mixture risk calculated by the methodology applying mixture models directly on SSDs remains consistent and would rather slightly overestimate the risk. These results can be used as a theoretical validation of the currently applied methodology. Nevertheless, when assessing the risk of mixtures, one has to keep in mind this source of error with this classical methodology, especially when SSDs present a distribution of the data outside the range determined in this study Finally, in the last part of this thesis, we confronted the mixture effect predictions with biological changes observed in the environment. In this study, long-term monitoring of a European great lake, Lake Geneva, provides the opportunity to assess to what extent the predicted toxicity of herbicide mixtures explains the changes in the composition of the phytoplankton community next to other classical limnology parameters such as nutrients. To reach this goal, the gradient of the mixture toxicity of 14 herbicides regularly detected in the lake was calculated, using concentration addition and response addition models. A decreasing temporal gradient of toxicity was observed from 2004 to 2009. Redundancy analysis and partial redundancy analysis showed that this gradient explains a significant portion of the variation in phytoplankton community composition, even when having removed the effect of all other co-variables. Moreover, some species that were revealed to be influenced positively or negatively, by the decrease of toxicity in the lake over time, showed similar behaviors in mesocosms studies. It could be concluded that the herbicide mixture toxicity is one of the key parameters to explain phytoplankton changes in Lake Geneva. To conclude, different methods exist to predict the risk of mixture in the ecosystems. But their reliability varies depending on the underlying hypotheses. One should therefore carefully consider these hypotheses, as well as the limits of the approaches, before using the results for environmental risk management

Machine learning for geospatial data : algorithms, software tools and case studies

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Finite mixture analysis of beauty-contest data using generalised beta distributions

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper introduces a mixture model based on the beta distribution, without preestablishedmeans and variances, to analyze a large set of Beauty-Contest data obtainedfrom diverse groups of experiments (Bosch-Domenech et al. 2002). This model gives a bettert of the experimental data, and more precision to the hypothesis that a large proportionof individuals follow a common pattern of reasoning, described as iterated best reply (degenerate),than mixture models based on the normal distribution. The analysis shows thatthe means of the distributions across the groups of experiments are pretty stable, while theproportions of choices at dierent levels of reasoning vary across groups.

Measuring school demand in the presence of spatial dependence. A conditional approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Improving educational quality is an important public policy goal. However, its success requires identifying factors associated with student achievement. At the core of these proposals lies the principle that increased public school quality can make school system more efficient, resulting in correspondingly stronger performance by students. Nevertheless, the public educational system is not devoid of competition which arises, among other factors, through the efficiency of management and the geographical location of schools. Moreover, families in Spain appear to choose a school on the grounds of location. In this environment, the objective of this paper is to analyze whether geographical space has an impact on the relationship between the level of technical quality of public schools (measured by the efficiency score) and the school demand index. To do this, an empirical application is performed on a sample of 1,695 public schools in the region of Catalonia (Spain). This application shows the effects of spatial autocorrelation on the estimation of the parameters and how these problems are addressed through spatial econometrics models. The results confirm that space has a moderating effect on the relationship between efficiency and school demand, although only in urban municipalities.

General Equilibrium Long-Run Determinants for Spanish FDI: A Spatial Panel Data Approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

While general equilibrium theories of trade stress the role of third-country effects, little work has been done in the empirical foreign direct investment (FDI) literature to test such spatial linkages. This paper aims to provide further insights into long-run determinants of Spanish FDI by considering not only bilateral but also spatially weighted third-country determinants. The few studies carried out so far have focused on FDI flows in a limited number of countries. However, Spanish FDI outflows have risen dramatically since 1995 and today account for a substantial part of global FDI. Therefore, we estimate recently developed Spatial Panel Data models by Maximum Likelihood (ML) procedures for Spanish outflows (1993-2004) to top-50 host countries. After controlling for unobservable effects, we find that spatial interdependence matters and provide evidence consistent with New Economic Geography (NEG) theories of agglomeration, mainly due to complex (vertical) FDI motivations. Spatial Error Models estimations also provide illuminating results regarding the transmission mechanism of shocks.

Some concepts and models usefulninthe analysis of discrete life time data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.

Statistical Models for Co-occurrence Data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

CAMCR: Computer assisted mixture model analysis for capture-recapture count data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

«
1
2
3
4
5
6
7
8
...
62
63
»