923 resultados para spatial clustering algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most countries of Europe, as well as many countries in other parts of the world, are experiencing an increased impact of natural hazards. It is often speculated, but not yet proven, that climate change might influence the frequency and magnitude of certain hydro-meteorological natural hazards. What has certainly been observed is a sharp increase in financial losses caused by natural hazards worldwide. Eventhough Europe appears to be a space that is not affected by natural hazards to such catastrophic extents as other parts of the world are, the damages experienced here are certainly increasing too. Natural hazards, climate change and, in particular, risks have therefore recently been put high on the political agenda of the EU. In the search for appropriate instruments for mitigating impacts of natural hazards and climate change, as well as risks, the integration of these factors into spatial planning practices is constantly receiving higher attention. The focus of most approaches lies on single hazards and climate change mitigation strategies. The current paradigm shift of climate change mitigation to adaptation is used as a basis to draw conclusions and recommendations on what concepts could be further incorporated into spatial planning practices. Especially multi-hazard approaches are discussed as an important approach that should be developed further. One focal point is the definition and applicability of the terms natural hazard, vulnerability and risk in spatial planning practices. Especially vulnerability and risk concepts are so many-fold and complicated that their application in spatial planning has to be analysed most carefully. The PhD thesis is based on six published articles that describe the results of European research projects, which have elaborated strategies and tools for integrated communication and assessment practices on natural hazards and climate change impacts. The papers describe approaches on local, regional and European level, both from theoretical and practical perspectives. Based on these, passed, current and future potential spatial planning applications are reviewed and discussed. In conclusion it is recommended to shift from single hazard assessments to multi-hazard approaches, integrating potential climate change impacts. Vulnerability concepts should play a stronger role than present, and adaptation to natural hazards and climate change should be more emphasized in relation to mitigation. It is outlined that the integration of risk concepts in planning is rather complicated and would need very careful assessment to ensure applicability. Future spatial planning practices should also consider to be more interdisciplinary, i.e. to integrate as many stakeholders and experts as possible to ensure the sustainability of investments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Being at the crossroads of the Old World continents, Western Asia has a unique position through which the dispersal and migration of mammals and the interaction of faunal bioprovinces occurred. Despite its critical position, the record of Miocene mammals in Western Asia is sporadic and there are large spatial and temporal gaps between the known fossil localities. Although the development of the mammalian faunas in the Miocene of the Old World is well known and there is ample evidence for environmental shifts in this epoch, efforts toward quantification of habitat changes and development of chronofaunas based on faunal compositions were mostly neglected. Advancement of chronological, paleoclimatological, and paleogeographical reconstruction tools and techniques and increased numbers of new discoveries in recent decades have brought the need for updating and modification of our level of understanding. We under took fieldwork and systematic study of mammalian trace and body fossils from the northwestern parts of Iran along with analysis of large mammal data from the NOW database. The data analysis was used to study the provinciality, relative abundance, and distribution history of the closed- and open-adapted taxa and chronofaunas in the Miocene of the Old World and Western Asia. The provinciality analysis was carried out, using locality clustering, and the relative abundance of the closed- and open-adapted taxa was surveyed at the family level. The distribution history of the chronofaunas was studied, using faunal resemblance indices and new mapping techniques, together with humidity analysis based on mean ordinated hypsodonty. Paleoichnological studies revealed the abundance of mammalian footprints in several parts of the basins studied, which are normally not fossiliferous in terms of body fossils. The systematic study and biochronology of the newly discovered mammalian fossils in northwestern Iran indicates their close affinities with middle Turolian faunas. Large cranial remains of hipparionine horses, previously unknown in Iran and Western Asia, are among the material studied. The initiation of a new field project in the famous Maragheh locality also brings new opportunities to address questions regarding the chronology and paleoenvironment of this classical site. Provinciality analysis modified our previous level of understandings, indicating the interaction of four provinces in Western Asia. The development of these provinces was apparently due to the presence of high mountain ranges in the area, which affected the dispersal of mammals and also climatic patterns. Higher temperatures and possibly higher co2 levels in the Middle Miocene Climatic Optimum apparently favored the development of the closed forested environments that supported the dominance of the closed-adapted taxa. The increased seasonality and the progressive cooling and drying of the midlatitudes toward the Late Miocene maintained the dominance of open-adapted faunas. It appears that the late Middle Miocene was the time of transition from a more forested to a less forested world. The distribution history of the closed- and open-adapted chronofaunas shows the presence of cosmopolitan and endemic faunas in Western Asia. The closed-adapted faunas, such as the Arabian chronofauna of the late Early‒early Middle Miocene, demonstrated a rapid buildup and gradual decline. The open-adapted chronofaunas, such as the Late Miocene Maraghean fauna, climaxed gradually by filling the opening environments and moving in response to changes in humidity patterns. They abruptly declined due to demise of their favored environments. The Siwalikan chronofauna of the early Late Miocene remained endemic and restricted through all its history. This study highlights the importance of field investigations and indicates that new surveys in the vast areas of Western Asia, which are poorly sampled in terms of fossil mammal localities, can still be promising. Clustering of the localities supports the consistency of formerly known patterns and augments them. Although the quantitative approach to relative abundance history of the closed- and open-adapted mammals harks back to more than half a century ago, it is a novel technique providing robust results. Tracking the history of the chronofaunas in space and time by means of new computational and illustration methods is also a new practice that can be expanded to new areas and time spans.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Algorithms are described for the basic arithmetic operations and square rooting in a negative base. A new operation called polarization that reverses the sign of a number facilitates subtraction, using addition. Some special features of the negative-base arithmetic are also mentioned.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Disease maps are effective tools for explaining and predicting patterns of disease outcomes across geographical space, identifying areas of potentially elevated risk, and formulating and validating aetiological hypotheses for a disease. Bayesian models have become a standard approach to disease mapping in recent decades. This article aims to provide a basic understanding of the key concepts involved in Bayesian disease mapping methods for areal data. It is anticipated that this will help in interpretation of published maps, and provide a useful starting point for anyone interested in running disease mapping methods for areal data. The article provides detailed motivation and descriptions on disease mapping methods by explaining the concepts, defining the technical terms, and illustrating the utility of disease mapping for epidemiological research by demonstrating various ways of visualising model outputs using a case study. The target audience includes spatial scientists in health and other fields, policy or decision makers, health geographers, spatial analysts, public health professionals, and epidemiologists.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ever expanding growth of the wireless access to the Internet in recent years has led to the proliferation of wireless and mobile devices to connect to the Internet. This has created the possibility of mobile devices equipped with multiple radio interfaces to connect to the Internet using any of several wireless access network technologies such as GPRS, WLAN and WiMAX in order to get the connectivity best suited for the application. These access networks are highly heterogeneous and they vary widely in their characteristics such as bandwidth, propagation delay and geographical coverage. The mechanism by which a mobile device switches between these access networks during an ongoing connection is referred to as vertical handoff and it often results in an abrupt and significant change in the access link characteristics. The most common Internet applications such as Web browsing and e-mail make use of the Transmission Control Protocol (TCP) as their transport protocol and the behaviour of TCP depends on the end-to-end path characteristics such as bandwidth and round-trip time (RTT). As the wireless access link is most likely the bottleneck of a TCP end-to-end path, the abrupt changes in the link characteristics due to a vertical handoff may affect TCP behaviour adversely degrading the performance of the application. The focus of this thesis is to study the effect of a vertical handoff on TCP behaviour and to propose algorithms that improve the handoff behaviour of TCP using cross-layer information about the changes in the access link characteristics. We begin this study by identifying the various problems of TCP due to a vertical handoff based on extensive simulation experiments. We use this study as a basis to develop cross-layer assisted TCP algorithms in handoff scenarios involving GPRS and WLAN access networks. We then extend the scope of the study by developing cross-layer assisted TCP algorithms in a broader context applicable to a wide range of bandwidth and delay changes during a handoff. And finally, the algorithms developed here are shown to be easily extendable to the multiple-TCP flow scenario. We evaluate the proposed algorithms by comparison with standard TCP (TCP SACK) and show that the proposed algorithms are effective in improving TCP behavior in vertical handoff involving a wide range of bandwidth and delay of the access networks. Our algorithms are easy to implement in real systems and they involve modifications to the TCP sender algorithm only. The proposed algorithms are conservative in nature and they do not adversely affect the performance of TCP in the absence of cross-layer information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The metabolism of an organism consists of a network of biochemical reactions that transform small molecules, or metabolites, into others in order to produce energy and building blocks for essential macromolecules. The goal of metabolic flux analysis is to uncover the rates, or the fluxes, of those biochemical reactions. In a steady state, the sum of the fluxes that produce an internal metabolite is equal to the sum of the fluxes that consume the same molecule. Thus the steady state imposes linear balance constraints to the fluxes. In general, the balance constraints imposed by the steady state are not sufficient to uncover all the fluxes of a metabolic network. The fluxes through cycles and alternative pathways between the same source and target metabolites remain unknown. More information about the fluxes can be obtained from isotopic labelling experiments, where a cell population is fed with labelled nutrients, such as glucose that contains 13C atoms. Labels are then transferred by biochemical reactions to other metabolites. The relative abundances of different labelling patterns in internal metabolites depend on the fluxes of pathways producing them. Thus, the relative abundances of different labelling patterns contain information about the fluxes that cannot be uncovered from the balance constraints derived from the steady state. The field of research that estimates the fluxes utilizing the measured constraints to the relative abundances of different labelling patterns induced by 13C labelled nutrients is called 13C metabolic flux analysis. There exist two approaches of 13C metabolic flux analysis. In the optimization approach, a non-linear optimization task, where candidate fluxes are iteratively generated until they fit to the measured abundances of different labelling patterns, is constructed. In the direct approach, linear balance constraints given by the steady state are augmented with linear constraints derived from the abundances of different labelling patterns of metabolites. Thus, mathematically involved non-linear optimization methods that can get stuck to the local optima can be avoided. On the other hand, the direct approach may require more measurement data than the optimization approach to obtain the same flux information. Furthermore, the optimization framework can easily be applied regardless of the labelling measurement technology and with all network topologies. In this thesis we present a formal computational framework for direct 13C metabolic flux analysis. The aim of our study is to construct as many linear constraints to the fluxes from the 13C labelling measurements using only computational methods that avoid non-linear techniques and are independent from the type of measurement data, the labelling of external nutrients and the topology of the metabolic network. The presented framework is the first representative of the direct approach for 13C metabolic flux analysis that is free from restricting assumptions made about these parameters.In our framework, measurement data is first propagated from the measured metabolites to other metabolites. The propagation is facilitated by the flow analysis of metabolite fragments in the network. Then new linear constraints to the fluxes are derived from the propagated data by applying the techniques of linear algebra.Based on the results of the fragment flow analysis, we also present an experiment planning method that selects sets of metabolites whose relative abundances of different labelling patterns are most useful for 13C metabolic flux analysis. Furthermore, we give computational tools to process raw 13C labelling data produced by tandem mass spectrometry to a form suitable for 13C metabolic flux analysis.