Biblioteca Digital

913 resultados para grid-based spatial data

Evaluating the influence of spatial uncertainty in locality points for species distributional modeling

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.

Bayesian classification criterion for forensic multivariate data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.

Advanced mapping of environmental data: Geostatistics, Machine Learning and Bayesian Maximum Entropy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.

The agreement between ipsative and normative questionnaires using compositional data analysis techniques

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main instrument used in psychological measurement is the self-report questionnaire. One of its majordrawbacks however is its susceptibility to response biases. A known strategy to control these biases hasbeen the use of so-called ipsative items. Ipsative items are items that require the respondent to makebetween-scale comparisons within each item. The selected option determines to which scale the weight ofthe answer is attributed. Consequently in questionnaires only consisting of ipsative items everyrespondent is allotted an equal amount, i.e. the total score, that each can distribute differently over thescales. Therefore this type of response format yields data that can be considered compositional from itsinception.Methodological oriented psychologists have heavily criticized this type of item format, since the resultingdata is also marked by the associated unfavourable statistical properties. Nevertheless, clinicians havekept using these questionnaires to their satisfaction. This investigation therefore aims to evaluate bothpositions and addresses the similarities and differences between the two data collection methods. Theultimate objective is to formulate a guideline when to use which type of item format.The comparison is based on data obtained with both an ipsative and normative version of threepsychological questionnaires, which were administered to 502 first-year students in psychology accordingto a balanced within-subjects design. Previous research only compared the direct ipsative scale scoreswith the derived ipsative scale scores. The use of compositional data analysis techniques also enables oneto compare derived normative score ratios with direct normative score ratios. The addition of the secondcomparison not only offers the advantage of a better-balanced research strategy. In principle it also allowsfor parametric testing in the evaluation

MAPI: towards the integrated exploitation of bioinformatics Web Services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND. Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. RESULTS. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. CONCLUSIONS. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).

Adipose tissue concentrations of persistent organic pollutants and total cancer risk in an adult cohort from Southern Spain: preliminary data from year 9 of the follow-up.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is an increasing trend in the incidence of cancer worldwide, and it has been accepted that environmental factors account for an important proportion of the global burden. The present paper reports preliminary findings on the influence of the historical exposure to a group of persistent organic pollutants on total cancer risk, at year 9 in the follow-up of a cohort from Southern Spain. A cohort of 368 participants (median age 51 years) was recruited in 2003. Their historical exposure was estimated by analyzing residues of persistent organic pollutants in adipose tissue. Estimation of cancer incidence was based on data from a population-based cancer registry. Statistical analyses were performed using multivariable Cox-regression models. In males, PCB 153 concentrations were positively associated with total cancer risk, with an adjusted hazard ratio (95% confidence interval) of 1.20 (1.01-1.41) for an increment of 100 ng/g lipid. Our preliminary findings suggest a potential relationship between the historical exposure to persistent organic pollutants and the risk of cancer in men. However, these results should be interpreted with caution and require verification during the future follow-up of this cohort.

Updated survival data of the phase iii clinical trial of novottf-100a versus best standard chemotherapy for recurrent glioblastoma

Relevância:

100.00% 100.00%

Publicador:

Resumo:

NovoTTF-100A (TTF) is a portable device delivering low-intensity, intermediate-frequency, alternating electric fields using noninvasive, disposable scalp electrodes. TTF interferes with tumor cell division, and it has been approved by the US Food and Drug Administration (FDA) for the treatment of recurrent glioblastoma (rGBM) based on data from a phase III trial. This presentation describes the updated survival data 2 years after completing recruitment. Adults with rGBM (KPS ≥ 70) were randomized (stratified by surgery and center) to either continuous TTF (20-24 h/day, 7 days/week) or efficacious chemotherapy based on best physician choice (BPC). The primary endpoint was overall survival (OS), and secondary endpoints were PFS6, 1-year survival, and QOL. Patients were randomized (28 US and European centers) to either TTF alone (n ¼ 120) or BPC (n ¼ 117). Patient characteristics were balanced, median age was 54 years (range, 23-80 years), and median KPS was 80 (range, 50-100). One quarter of the patients had debulking surgery, and over half of the patients were at their second or later recurrence. OS in the intent-to-treat (ITT) population was equivalent in TTF versus BPC patients (median OS, 6.6vs. 6.0 months; n ¼ 237; p ¼ 0.26; HR ¼ 0.86). With a median follow-up of 33.6 months, long-term survival in the TTF group was higher than that in the BPC group at 2, 3, and 4 years of follow-up (9.3% vs. 6.6%; 8.4% vs. 1.4%; 8.4% vs. 0.0%, respectively). Analysis of patients who received at least one treatment course demonstrated a survival benefit for TTF patients compared to BPC patients (median OS, 7.8 vs. 6.0 months; n ¼ 93 vs. n ¼ 117; p ¼ 0.012; HR ¼ 0.69). In this group, 1-year survival was 28% vs. 20%, and PFS6 was 26.2% vs. 15.2% (p ¼ 0.034). TTF, a noninvasive, novel cancer treatment modality shows significant therapeutic efficacy with promising long-term survival results. The impact of TTF was more pronounced when comparing only patients who received the minimal treatment course. A large-scale phase III trial in newly diagnosed GBM is ongoing.

Towards reusability and tailorability in collaborative learning systems using IMS-LD and grid services

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CSCL applications are complex distributed systems that posespecial requirements towards achieving success in educationalsettings. Flexible and efficient design of collaborative activitiesby educators is a key precondition in order to provide CSCL tailorable systems, capable of adapting to the needs of eachparticular learning environment. Furthermore, some parts ofthose CSCL systems should be reused as often as possible inorder to reduce development costs. In addition, it may be necessary to employ special hardware devices, computational resources that reside in other organizations, or even exceed thepossibilities of one specific organization. Therefore, theproposal of this paper is twofold: collecting collaborativelearning designs (scripting) provided by educators, based onwell-known best practices (collaborative learning flow patterns) in a standard way (IMS-LD) in order to guide the tailoring of CSCL systems by selecting and integrating reusable CSCL software units; and, implementing those units in the form of grid services offered by third party providers. More specifically, this paper outlines a grid-based CSCL system having these features and illustrates its potential scope and applicability by means of a sample collaborative learning scenario.

Wavelet analysis residual kriging vs. neural network residual kriging

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper deals with the problem of spatial data mapping. A new method based on wavelet interpolation and geostatistical prediction (kriging) is proposed. The method - wavelet analysis residual kriging (WARK) - is developed in order to assess the problems rising for highly variable data in presence of spatial trends. In these cases stationary prediction models have very limited application. Wavelet analysis is used to model large-scale structures and kriging of the remaining residuals focuses on small-scale peculiarities. WARK is able to model spatial pattern which features multiscale structure. In the present work WARK is applied to the rainfall data and the results of validation are compared with the ones obtained from neural network residual kriging (NNRK). NNRK is also a residual-based method, which uses artificial neural network to model large-scale non-linear trends. The comparison of the results demonstrates the high quality performance of WARK in predicting hot spots, reproducing global statistical characteristics of the distribution and spatial correlation structure.

Analysis, modelling and classification of geospatial data using machine learning

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.

Holt Oram syndrome: a registry-based study in Europe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Holt-Oram syndrome (HOS) is an autosomal dominant disorder characterised by upper limb anomalies and congenital heart defects. We present epidemiological and clinical aspects of HOS patients using data from EUROCAT (European Surveillance of Congenital Anomalies) registries. METHODS: The study was based on data collected during 1990-2011 by 34 registries. The registries are population-based and use multiple sources of information to collect data on all types of birth using standardized definitions, methodology and coding. Diagnostic criteria for inclusion in the study were the presence of radial ray abnormalities and congenital heart disease (CHD), or the presence of either radial ray anomaly or CHD, with family history of HOS. RESULTS: A total of 73 cases of HOS were identified, including 11 (15.1%) TOPFA and 62 (84.9%) LB. Out of 73 HOS cases, 30.8% (20/65) were suspected prenatally, 55.4% (36/65) at birth, 10.7% (7/65) in the first week of life, and 3.1% (2/65) in the first year of life. The prenatal detection rate was 39.2% (20/51), with no significant change over the study period. In 55% (11/20) of prenatally detected cases, parents decided to terminate pregnancy. Thumb anomalies were reported in all cases. Agenesis/hypoplasia of radius was present in 49.2% (30/61), ulnar aplasia/hypoplasia in 24.6% (15/61) and humerus hypoplasia/phocomelia in 42.6% (26/61) of patients. Congenital heart defects (CHD) were recorded in 78.7% (48/61) of patients. Isolated septal defects were present in 54.2 (26/48), while 25% (12/48) of patients had complex/severe CHD. The mean prevalence of HOS diagnosed prenatally or in the early years of life in European registries was 0.7 per 100,000 births or 1:135,615 births. CONCLUSIONS: HOS is a rare genetic condition showing regional variation in its prevalence. It is often missed prenatally, in spite of the existence of major structural anomalies. When discovered, parents in 45% (9/20) of cases opt for the continuation of pregnancy. Although a quarter of patients have severe CHD, the overall first week survival is very good, which is important information for counselling purposes.

Advanced Kernel Methods For Remote Sensing Image Classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.

Magnetic resonance imaging correlates of first-episode psychosis in young adult male patients: combined analysis of grey and white matter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Several patterns of grey and white matter changes have been separately described in young adults with first-episode psychosis. Concomitant investigation of grey and white matter densities in patients with first-episode psychosis without other psychiatric comorbidities that include all relevant imaging markers could provide clues to the neurodevelopmental hypothesis in schizophrenia. Methods: We recruited patients with first-episode psychosis diagnosed according to the DSM-IV-TR and matched controls. All participants underwent magnetic resonance imaging (MRI). Voxel-based morphometry (VBM) analysis and mean diffusivity voxel-based analysis (VBA) were used for grey matter data. Fractional anisotropy and axial, radial and mean diffusivity were analyzed using tract-based spatial statistics (TBSS) for white matter data. Results: We included 15 patients and 16 controls. The mean diffusivity VBA showed significantly greater mean diffusivity in the first-episode psychosis than in the control group in the lingual gyrus bilaterally, the occipital fusiform gyrus bilaterally, the right lateral occipital gyrus and the right inferior temporal gyrus. Moreover, the TBSS analysis revealed a lower fractional anisotropy in the first-episode psychosis than in the control group in the genu of the corpus callosum, minor forceps, corticospinal tract, right superior longitudinal fasciculus, left middle cerebellar peduncle, left inferior longitudinal fasciculus and the posterior part of the fronto-occipital fasciculus. This analysis also revealed greater radial diffusivity in the first-episode psychosis than in the control group in the right corticospinal tract, right superior longitudinal fasciculus and left middle cerebellar peduncle. Limitations: The modest sample size and the absence of women in our series could limit the impact of our results. Conclusion: Our results highlight the structural vulnerability of grey matter in posterior areas of the brain among young adult male patients with first-episode psychosis. Moreover, the concomitant greater radial diffusivity within several regions already revealed by the fractional anisotropy analysis supports the idea of a late myelination in patients with first-episode psychosis.

Machine learning for geospatial data : algorithms, software tools and case studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Towards a universal sampling protocol for soil biotas in the humid tropics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reviews the methods for the inventory of below-ground biotas in the humid tropics, to document the (hypothesized) loss of soil biodiversity associated with deforestation and agricultural intensification at forest margins. The biotas were grouped into eight categories, each of which corresponded to a major functional group considered important or essential to soil function. An accurate inventory of soil organisms can assist in ecosystem management and help sustain agricultural production. The advantages and disadvantages of transect-based and grid-based sampling methods are discussed, illustrated by published protocols ranging from the original "TSBF transect", through versions developed for the alternatives to Slash-and-Burn Project (ASB) to the final schemes (with variants) adopted by the Conservation and Sustainable Management of Below-ground Biodiversity Project (CSM-BGBD). Consideration is given to the place and importance of replication in below-ground biological sampling and it is argued that the new sampling protocols are inclusive, i.e. designed to sample all eight biotic groups in the same field exercise; spatially scaled, i.e. provide biodiversity data at site, locality, landscape and regional levels, and link the data to land use and land cover; and statistically robust, as shown by a partial randomization of plot locations for sampling.

«
1
2
...
12
13
14
15
16
17
18
...
60
61
»