947 resultados para spatial clustering algorithms


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The relationships between environmental factors and temporal and spatial variations of benthic communities of three rocky shores of the state of Espírito Santo, Southeast Brazil, were studied. Sampling was conducted every three months, from August 2006 to May 2007, using intersection points. Chthamalus bisinuatus (Pilsbry, 1916) (Crustacea) and Brachidontes spp. (Mollusca) were the most abundant taxa, occupying the upper level of the intertidal zone of the rocky shore. The species richness was higher at the lower levels. The invasive species Isognomon bicolor (C. B. Adams, 1845) (Mollusca) occurred at low densities in the studied areas. The clustering analysis dendrogram indicated a separation of communities based on exposed and sheltered areas. According to the variance analyses, the communities were significantly different among the studied areas and seasons. The extent of wave exposure and shore slope influenced the species variability. The Setibão site showed the highest diversity and richness, most likely due to greater wave exposure. The communities showed greater variation in the lower levels where environmental conditions were less severe, relative to the other levels.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, multi-atlas fusion methods have gainedsignificant attention in medical image segmentation. Inthis paper, we propose a general Markov Random Field(MRF) based framework that can perform edge-preservingsmoothing of the labels at the time of fusing the labelsitself. More specifically, we formulate the label fusionproblem with MRF-based neighborhood priors, as an energyminimization problem containing a unary data term and apairwise smoothness term. We present how the existingfusion methods like majority voting, global weightedvoting and local weighted voting methods can be reframedto profit from the proposed framework, for generatingmore accurate segmentations as well as more contiguoussegmentations by getting rid of holes and islands. Theproposed framework is evaluated for segmenting lymphnodes in 3D head and neck CT images. A comparison ofvarious fusion algorithms is also presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a seminal paper [10], Weitz gave a deterministic fully polynomial approximation scheme for counting exponentially weighted independent sets (which is the same as approximating the partition function of the hard-core model from statistical physics) in graphs of degree at most d, up to the critical activity for the uniqueness of the Gibbs measure on the innite d-regular tree. ore recently Sly [8] (see also [1]) showed that this is optimal in the sense that if here is an FPRAS for the hard-core partition function on graphs of maximum egree d for activities larger than the critical activity on the innite d-regular ree then NP = RP. In this paper we extend Weitz's approach to derive a deterministic fully polynomial approximation scheme for the partition function of general two-state anti-ferromagnetic spin systems on graphs of maximum degree d, up to the corresponding critical point on the d-regular tree. The main ingredient of our result is a proof that for two-state anti-ferromagnetic spin systems on the d-regular tree, weak spatial mixing implies strong spatial mixing. his in turn uses a message-decay argument which extends a similar approach proposed recently for the hard-core model by Restrepo et al [7] to the case of general two-state anti-ferromagnetic spin systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a weighted spatial network, as specified by an exchange matrix, the variances of the spatial values are inversely proportional to the size of the regions. Spatial values are no more exchangeable under independence, thus weakening the rationale for ordinary permutation and bootstrap tests of spatial autocorrelation. We propose an alternative permutation test for spatial autocorrelation, based upon exchangeable spatial modes, constructed as linear orthogonal combinations of spatial values. The coefficients obtain as eigenvectors of the standardised exchange matrix appearing in spectral clustering, and generalise to the weighted case the concept of spatial filtering for connectivity matrices. Also, two proposals aimed at transforming an acessibility matrix into a exchange matrix with with a priori fixed margins are presented. Two examples (inter-regional migratory flows and binary adjacency networks) illustrate the formalism, rooted in the theory of spectral decomposition for reversible Markov chains.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Polistine wasps are important in Neotropical ecosystems due to their ubiquity and diversity. Inventories have not adequately considered spatial attributes of collected specimens. Spatial data on biodiversity are important for study and mitigation of anthropogenic impacts over natural ecosystems and for protecting species. We described and analyzed local-scale spatial patterns of collecting records of wasp species, as well as spatial variation of diversity descriptors in a 2500-hectare area of an Amazon forest in Brazil. Rare species comprised the largest fraction of the fauna. Close range spatial effects were detected for most of the more common species, with clustering of presence-data at short distances. Larger spatial lag effects could also be identified in some species, constituting probably cases of exogenous autocorrelation and candidates for explanations based on environmental factors. In a few cases, significant or near significant correlations were found between five species (of Agelaia, Angiopolybia, and Mischocyttarus) and three studied environmental variables: distance to nearest stream, terrain altitude, and the type of forest canopy. However, association between these factors and biodiversity variables were generally low. When used as predictors of polistine richness in a linear multiple regression, only the coefficient for the forest canopy variable resulted significant. Some level of prediction of wasp diversity variables can be attained based on environmental variables, especially vegetation structure. Large-scale landscape and regional studies should be scheduled to address this issue.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A major challenge in community ecology is a thorough understanding of the processes that govern the assembly and composition of communities in time and space. The growing threat of climate change to the vascular plant biodiversity of fragile ecosystems such as mountains has made it equally imperative to develop comprehensive methodologies to provide insights into how communities are assembled. In this perspective, the primary objective of this PhD thesis is to contribute to the theoretical and methodological development of community ecology, by proposing new solutions to better detect the ecological and evolutionary processes that govern community assembly. As phylogenetic trees provide by far, the most advanced tools to integrate the spatial, ecological and evolutionary dynamics of plant communities, they represent the cornerstone on which this work was based. In this thesis, I proposed new solutions to: (i) reveal trends in community assembly on phylogenies, depicted by the transition of signals at the nodes of the different species and lineages responsible for community assembly, (ii) contribute to evidence the importance of evolutionarily labile traits in the distribution of mountain plant species. More precisely, I demonstrated that phylogenetic and functional compositional turnover in plant communities was driven by climate and human land use gradients mostly influenced by evolutionarily labile traits, (iii) predict and spatially project the phylogenetic structure of communities using species distribution models, to identify the potential distribution of phylogenetic diversity, as well as areas of high evolutionary potential along elevation. The altitudinal setting of the Diablerets mountains (Switzerland) provided an appropriate model for this study. The elevation gradient served as a compression of large latitudinal variations similar to a collection of islands within a single area, and allowed investigations on a large number of plant communities. Overall, this thesis highlights that stochastic and deterministic environmental filtering processes mainly influence the phylogenetic structure of plant communities in mountainous areas. Negative density-dependent processes implied through patterns of phylogenetic overdispersion were only detected at the local scale, whereas environmental filtering implied through phylogenetic clustering was observed at both the regional and local scale. Finally, the integration of indices of phylogenetic community ecology with species distribution models revealed the prospects of providing novel and insightful explanations on the potential distribution of phylogenetic biodiversity in high mountain areas. These results generally demonstrate the usefulness of phylogenies in inferring assembly processes, and are worth considering in the theoretical and methodological development of tools to better understand phylogenetic community structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.