963 resultados para count data models
Resumo:
Because data on rare species usually are sparse, it is important to have efficient ways to sample additional data. Traditional sampling approaches are of limited value for rare species because a very large proportion of randomly chosen sampling sites are unlikely to shelter the species. For these species, spatial predictions from niche-based distribution models can be used to stratify the sampling and increase sampling efficiency. New data sampled are then used to improve the initial model. Applying this approach repeatedly is an adaptive process that may allow increasing the number of new occurrences found. We illustrate the approach with a case study of a rare and endangered plant species in Switzerland and a simulation experiment. Our field survey confirmed that the method helps in the discovery of new populations of the target species in remote areas where the predicted habitat suitability is high. In our simulations the model-based approach provided a significant improvement (by a factor of 1.8 to 4 times, depending on the measure) over simple random sampling. In terms of cost this approach may save up to 70% of the time spent in the field.
Resumo:
The objective of this work was to develop neural network models of backpropagation type to estimate solar radiation based on extraterrestrial radiation data, daily temperature range, precipitation, cloudiness and relative sunshine duration. Data from Córdoba, Argentina, were used for development and validation. The behaviour and adjustment between values observed and estimates obtained by neural networks for different combinations of input were assessed. These estimations showed root mean square error between 3.15 and 3.88 MJ m-2 d-1 . The latter corresponds to the model that calculates radiation using only precipitation and daily temperature range. In all models, results show good adjustment to seasonal solar radiation. These results allow inferring the adequate performance and pertinence of this methodology to estimate complex phenomena, such as solar radiation.
Resumo:
Waterproofing agents are widely used to protect leather and textiles in both domestic and occupational activities. An outbreak of acute respiratory syndrome following exposure to waterproofing sprays occurred during the winter 2002-2003 in Switzerland. About 180 cases were reported by the Swiss Toxicological Information Centre between October 2002 and March 2003, whereas fewer than 10 cases per year had been recorded previously. The reported cases involved three brands of sprays containing a common waterproofing mixture, that had undergone a formulation change in the months preceding the outbreak. A retrospective analysis was undertaken in collaboration with the Swiss Toxicological Information Centre and the Swiss Registries for Interstitial and Orphan Lung Diseases to clarify the circumstances and possible causes of the observed health effects. Individual exposure data were generated with questionnaires and experimental emission measurements. The collected data was used to conduct numeric simulation for 102 cases of exposure. A classical two-zone model was used to assess the aerosol dispersion in the near- and far-field during spraying. The resulting assessed dose and exposure levels obtained were spread on large scales, of several orders of magnitude. No dose-response relationship was found between exposure indicators and health effects indicators (perceived severity and clinical indicators). Weak relationships were found between unspecific inflammatory response indicators (leukocytes, C-reactive protein) and the maximal exposure concentration. The results obtained disclose a high interindividual response variability and suggest that some indirect mechanism(s) predominates in the respiratory disease occurrence. Furthermore, no threshold could be found to define a safe level of exposure. These findings suggest that the improvement of environmental exposure conditions during spraying alone does not constitute a sufficient measure to prevent future outbreaks of waterproofing spray toxicity. More efficient preventive measures are needed prior to the marketing and distribution of new waterproofing agents.
Resumo:
BACKGROUND: There is an ever-increasing volume of data on host genes that are modulated during HIV infection, influence disease susceptibility or carry genetic variants that impact HIV infection. We created GuavaH (Genomic Utility for Association and Viral Analyses in HIV, http://www.GuavaH.org), a public resource that supports multipurpose analysis of genome-wide genetic variation and gene expression profile across multiple phenotypes relevant to HIV biology. FINDINGS: We included original data from 8 genome and transcriptome studies addressing viral and host responses in and ex vivo. These studies cover phenotypes such as HIV acquisition, plasma viral load, disease progression, viral replication cycle, latency and viral-host genome interaction. This represents genome-wide association data from more than 4,000 individuals, exome sequencing data from 392 individuals, in vivo transcriptome microarray data from 127 patients/conditions, and 60 sets of RNA-seq data. Additionally, GuavaH allows visualization of protein variation in ~8,000 individuals from the general population. The publicly available GuavaH framework supports queries on (i) unique single nucleotide polymorphism across different HIV related phenotypes, (ii) gene structure and variation, (iii) in vivo gene expression in the setting of human infection (CD4+ T cells), and (iv) in vitro gene expression data in models of permissive infection, latency and reactivation. CONCLUSIONS: The complexity of the analysis of host genetic influences on HIV biology and pathogenesis calls for comprehensive motors of research on curated data. The tool developed here allows queries and supports validation of the rapidly growing body of host genomic information pertinent to HIV research.
Resumo:
Dendritic cells (DCs) are the most potent antigen-presenting cells in the human lung and are now recognized as crucial initiators of immune responses in general. They are arranged as sentinels in a dense surveillance network inside and below the epithelium of the airways and alveoli, where thet are ideally situated to sample inhaled antigen. DCs are known to play a pivotal role in maintaining the balance between tolerance and active immune response in the respiratory system. It is no surprise that the lungs became a main focus of DC-related investigations as this organ provides a large interface for interactions of inhaled antigens with the human body. During recent years there has been a constantly growing body of lung DC-related publications that draw their data from in vitro models, animal models and human studies. This review focuses on the biology and functions of different DC populations in the lung and highlights the advantages and drawbacks of different models with which to study the role of lung DCs. Furthermore, we present a number of up-to-date visualization techniques to characterize DC-related cell interactions in vitro and/or in vivo.
Resumo:
Mountain ecosystems will likely be affected by global warming during the 21st century, with substantial biodiversity loss predicted by species distribution models (SDMs). Depending on the geographic extent, elevation range and spatial resolution of data used in making these models, different rates of habitat loss have been predicted, with associated risk of species extinction. Few coordinated across-scale comparisons have been made using data of different resolution and geographic extent. Here, we assess whether climate-change induced habitat losses predicted at the European scale (10x10' grid cells) are also predicted from local scale data and modeling (25x25m grid cells) in two regions of the Swiss Alps. We show that local-scale models predict persistence of suitable habitats in up to 100% of species that were predicted by a European-scale model to lose all their suitable habitats in the area. Proportion of habitat loss depends on climate change scenario and study area. We find good agreement between the mismatch in predictions between scales and the fine-grain elevation range within 10x10' cells. The greatest prediction discrepancy for alpine species occurs in the area with the largest nival zone. Our results suggest elevation range as the main driver for the observed prediction discrepancies. Local scale projections may better reflect the possibility for species to track their climatic requirement toward higher elevations.
Resumo:
1. Few examples of habitat-modelling studies of rare and endangered species exist in the literature, although from a conservation perspective predicting their distribution would prove particularly useful. Paucity of data and lack of valid absences are the probable reasons for this shortcoming. Analytic solutions to accommodate the lack of absence include the ecological niche factor analysis (ENFA) and the use of generalized linear models (GLM) with simulated pseudo-absences. 2. In this study we tested a new approach to generating pseudo-absences, based on a preliminary ENFA habitat suitability (HS) map, for the endangered species Eryngium alpinum. This method of generating pseudo-absences was compared with two others: (i) use of a GLM with pseudo-absences generated totally at random, and (ii) use of an ENFA only. 3. The influence of two different spatial resolutions (i.e. grain) was also assessed for tackling the dilemma of quality (grain) vs. quantity (number of occurrences). Each combination of the three above-mentioned methods with the two grains generated a distinct HS map. 4. Four evaluation measures were used for comparing these HS maps: total deviance explained, best kappa, Gini coefficient and minimal predicted area (MPA). The last is a new evaluation criterion proposed in this study. 5. Results showed that (i) GLM models using ENFA-weighted pseudo-absence provide better results, except for the MPA value, and that (ii) quality (spatial resolution and locational accuracy) of the data appears to be more important than quantity (number of occurrences). Furthermore, the proposed MPA value is suggested as a useful measure of model evaluation when used to complement classical statistical measures. 6. Synthesis and applications. We suggest that the use of ENFA-weighted pseudo-absence is a possible way to enhance the quality of GLM-based potential distribution maps and that data quality (i.e. spatial resolution) prevails over quantity (i.e. number of data). Increased accuracy of potential distribution maps could help to define better suitable areas for species protection and reintroduction.
Resumo:
Quantifying the spatial configuration of hydraulic conductivity (K) in heterogeneous geological environments is essential for accurate predictions of contaminant transport, but is difficult because of the inherent limitations in resolution and coverage associated with traditional hydrological measurements. To address this issue, we consider crosshole and surface-based electrical resistivity geophysical measurements, collected in time during a saline tracer experiment. We use a Bayesian Markov-chain-Monte-Carlo (McMC) methodology to jointly invert the dynamic resistivity data, together with borehole tracer concentration data, to generate multiple posterior realizations of K that are consistent with all available information. We do this within a coupled inversion framework, whereby the geophysical and hydrological forward models are linked through an uncertain relationship between electrical resistivity and concentration. To minimize computational expense, a facies-based subsurface parameterization is developed. The Bayesian-McMC methodology allows us to explore the potential benefits of including the geophysical data into the inverse problem by examining their effect on our ability to identify fast flowpaths in the subsurface, and their impact on hydrological prediction uncertainty. Using a complex, geostatistically generated, two-dimensional numerical example representative of a fluvial environment, we demonstrate that flow model calibration is improved and prediction error is decreased when the electrical resistivity data are included. The worth of the geophysical data is found to be greatest for long spatial correlation lengths of subsurface heterogeneity with respect to wellbore separation, where flow and transport are largely controlled by highly connected flowpaths.
Resumo:
The objective of this study was to evaluate the performance of stacked species distribution models in predicting the alpha and gamma species diversity patterns of two important plant clades along elevation in the Andes. We modelled the distribution of the species in the Anthurium genus (53 species) and the Bromeliaceae family (89 species) using six modelling techniques. We combined all of the predictions for the same species in ensemble models based on two different criteria: the average of the rescaled predictions by all techniques and the average of the best techniques. The rescaled predictions were then reclassified into binary predictions (presence/absence). By stacking either the original predictions or binary predictions for both ensemble procedures, we obtained four different species richness models per taxa. The gamma and alpha diversity per elevation band (500 m) was also computed. To evaluate the prediction abilities for the four predictions of species richness and gamma diversity, the models were compared with the real data along an elevation gradient that was independently compiled by specialists. Finally, we also tested whether our richness models performed better than a null model of altitudinal changes of diversity based on the literature. Stacking of the ensemble prediction of the individual species models generated richness models that proved to be well correlated with the observed alpha diversity richness patterns along elevation and with the gamma diversity derived from the literature. Overall, these models tend to overpredict species richness. The use of the ensemble predictions from the species models built with different techniques seems very promising for modelling of species assemblages. Stacking of the binary models reduced the over-prediction, although more research is needed. The randomisation test proved to be a promising method for testing the performance of the stacked models, but other implementations may still be developed.
Resumo:
The comparison of radiotherapy techniques regarding secondary cancer risk has yielded contradictory results possibly stemming from the many different approaches used to estimate risk. The purpose of this study was to make a comprehensive evaluation of different available risk models applied to detailed whole-body dose distributions computed by Monte Carlo for various breast radiotherapy techniques including conventional open tangents, 3D conformal wedged tangents and hybrid intensity modulated radiation therapy (IMRT). First, organ-specific linear risk models developed by the International Commission on Radiological Protection (ICRP) and the Biological Effects of Ionizing Radiation (BEIR) VII committee were applied to mean doses for remote organs only and all solid organs. Then, different general non-linear risk models were applied to the whole body dose distribution. Finally, organ-specific non-linear risk models for the lung and breast were used to assess the secondary cancer risk for these two specific organs. A total of 32 different calculated absolute risks resulted in a broad range of values (between 0.1% and 48.5%) underlying the large uncertainties in absolute risk calculation. The ratio of risk between two techniques has often been proposed as a more robust assessment of risk than the absolute risk. We found that the ratio of risk between two techniques could also vary substantially considering the different approaches to risk estimation. Sometimes the ratio of risk between two techniques would range between values smaller and larger than one, which then translates into inconsistent results on the potential higher risk of one technique compared to another. We found however that the hybrid IMRT technique resulted in a systematic reduction of risk compared to the other techniques investigated even though the magnitude of this reduction varied substantially with the different approaches investigated. Based on the epidemiological data available, a reasonable approach to risk estimation would be to use organ-specific non-linear risk models applied to the dose distributions of organs within or near the treatment fields (lungs and contralateral breast in the case of breast radiotherapy) as the majority of radiation-induced secondary cancers are found in the beam-bordering regions.
Resumo:
Although hydrocarbon-bearing fluids have been known from the alkaline igneous rocks of the Khibiny intrusion for many years, their origin remains enigmatic. A recently proposed model of post-magmatic hydrocarbon (HC) generation through Fischer-Tropsch (FT) type reactions suggests the hydration of Fe-bearing phases and release of H-2 which reacts with magmatically derived CO2 to form CH4 and higher HCs. However, new petrographic, microthermometric, laser Raman, bulk gas and isotope data are presented and discussed in the context of previously published work in order to reassess models of HC generation. The gas phase is dominated by CH4 with only minor proportions of higher hydrocarbons. No remnants of the proposed primary CO2-rich fluid are found in the complex. The majority of the fluid inclusions are of secondary nature and trapped in healed microfractures. This indicates a high fluid flux after magma crystallisation. Entrapment conditions for fluid inclusions are 450-550 degrees C at 2.8-4.5 kbar. These temperatures are too high for hydrocarbon gas generation through the FT reaction. Chemical analyses of rims of Fe-rich phases suggest that they are not the result of alteration but instead represent changes in magma composition during crystallisation. Furthermore, there is no clear relationship between the presence of Fe-rich minerals and the abundance of fluid inclusion planes (FIPs) as reported elsewhere. delta C-13 values for methane range from -22.4% to -5.4%, confirming a largely abiogenic origin for the gas. The presence of primary CH4-dominated fluid inclusions and melt inclusions, which contain a methane-rich gas phase, indicates a magmatic origin of the HCs. An increase in methane content, together with a decrease in delta C-13 isotope values towards the intrusion margin suggests that magmatically derived abiogenic hydrocarbons may have mixed with biogenic hydrocarbons derived from the surrounding country rocks. (C) 2006 Elsevier BV. All rights reserved.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
Because self-reported health status [SRHS] is an ordered response variable, inequality measurement for SRHS data requires a numerical scale for converting individual responses into a summary statistic. The choice of scale is however problematic, since small variations in the numerical scale may reverse the ordering of a given pair of distributions of SRHS data in relation to conventional inequality indices such as the variance. This paper introduces a parametric family of inequality indices, founded on an inequality ordering proposed by Allison and Foster [Allison, R.A., Foster, J., 2004. Measuring health inequalities using qualitative data. Journal of Health Economics 23, 505-524], which satisfy a suitable invariance property with respect to the choice of numerical scale. Several key members of the parametric family are also derived, and an empirical application using data from the Swiss Health Survey illustrates the proposed methodology. [Authors]
Resumo:
Risk maps summarizing landscape suitability of novel areas for invading species can be valuable tools for preventing species' invasions or controlling their spread, but methods employed for development of such maps remain variable and unstandardized. We discuss several considerations in development of such models, including types of distributional information that should be used, the nature of explanatory variables that should be incorporated, and caveats regarding model testing and evaluation. We highlight that, in the case of invasive species, such distributional predictions should aim to derive the best hypothesis of the potential distribution of the species by using (1) all distributional information available, including information from both the native range and other invaded regions; (2) predictors linked as directly as is feasible to the physiological requirements of the species; and (3) modelling procedures that carefully avoid overfitting to the training data. Finally, model testing and evaluation should focus on well-predicted presences, and less on efficient prediction of absences; a k-fold regional cross-validation test is discussed.
Resumo:
US Geological Survey (USGS) based elevation data are the most commonly used data source for highway hydraulic analysis; however, due to the vertical accuracy of USGS-based elevation data, USGS data may be too “coarse” to adequately describe surface profiles of watershed areas or drainage patterns. Additionally hydraulic design requires delineation of much smaller drainage areas (watersheds) than other hydrologic applications, such as environmental, ecological, and water resource management. This research study investigated whether higher resolution LIDAR based surface models would provide better delineation of watersheds and drainage patterns as compared to surface models created from standard USGS-based elevation data. Differences in runoff values were the metric used to compare the data sets. The two data sets were compared for a pilot study area along the Iowa 1 corridor between Iowa City and Mount Vernon. Given the limited breadth of the analysis corridor, areas of particular emphasis were the location of drainage area boundaries and flow patterns parallel to and intersecting the road cross section. Traditional highway hydrology does not appear to be significantly impacted, or benefited, by the increased terrain detail that LIDAR provided for the study area. In fact, hydrologic outputs, such as streams and watersheds, may be too sensitive to the increased horizontal resolution and/or errors in the data set. However, a true comparison of LIDAR and USGS-based data sets of equal size and encompassing entire drainage areas could not be performed in this study. Differences may also result in areas with much steeper slopes or significant changes in terrain. LIDAR may provide possibly valuable detail in areas of modified terrain, such as roads. Better representations of channel and terrain detail in the vicinity of the roadway may be useful in modeling problem drainage areas and evaluating structural surety during and after significant storm events. Furthermore, LIDAR may be used to verify the intended/expected drainage patterns at newly constructed highways. LIDAR will likely provide the greatest benefit for highway projects in flood plains and areas with relatively flat terrain where slight changes in terrain may have a significant impact on drainage patterns.