947 resultados para spatial clustering algorithms
Resumo:
Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations Research
Resumo:
This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.
Resumo:
This paper presents the characterization of high voltage (HV) electric power consumers based on a data clustering approach. The typical load profiles (TLP) are obtained selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The choice of the best partition is supported using several cluster validity indices. The proposed data-mining (DM) based methodology, that includes all steps presented in the process of knowledge discovery in databases (KDD), presents an automatic data treatment application in order to preprocess the initial database in an automatic way, allowing time saving and better accuracy during this phase. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ consumption behavior. To validate our approach, a case study with a real database of 185 HV consumers was used.
Resumo:
A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems.
Resumo:
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Resumo:
Further improvements in demand response programs implementation are needed in order to take full advantage of this resource, namely for the participation in energy and reserve market products, requiring adequate aggregation and remuneration of small size resources. The present paper focuses on SPIDER, a demand response simulation that has been improved in order to simulate demand response, including realistic power system simulation. For illustration of the simulator’s capabilities, the present paper is proposes a methodology focusing on the aggregation of consumers and generators, providing adequate tolls for the demand response program’s adoption by evolved players. The methodology proposed in the present paper focuses on a Virtual Power Player that manages and aggregates the available demand response and distributed generation resources in order to satisfy the required electrical energy demand and reserve. The aggregation of resources is addressed by the use of clustering algorithms, and operation costs for the VPP are minimized. The presented case study is based on a set of 32 consumers and 66 distributed generation units, running on 180 distinct operation scenarios.
Resumo:
Dissertation to Obtain Master Degree in Biomedical Engineering
Resumo:
Human Activity Recognition systems require objective and reliable methods that can be used in the daily routine and must offer consistent results according with the performed activities. These systems are under development and offer objective and personalized support for several applications such as the healthcare area. This thesis aims to create a framework for human activities recognition based on accelerometry signals. Some new features and techniques inspired in the audio recognition methodology are introduced in this work, namely Log Scale Power Bandwidth and the Markov Models application. The Forward Feature Selection was adopted as the feature selection algorithm in order to improve the clustering performances and limit the computational demands. This method selects the most suitable set of features for activities recognition in accelerometry from a 423th dimensional feature vector. Several Machine Learning algorithms were applied to the used accelerometry databases – FCHA and PAMAP databases - and these showed promising results in activities recognition. The developed algorithm set constitutes a mighty contribution for the development of reliable evaluation methods of movement disorders for diagnosis and treatment applications.
Resumo:
Aim We test for the congruence between allele-based range boundaries (break zones) in silicicolous alpine plants and species-based break zones in the silicicolous flora of the European Alps. We also ask whether such break zones coincide with areas of large elevational variation.Location The European Alps.Methods On a regular grid laid across the entire Alps, we determined areas of allele- and species-based break zones using respective clustering algorithms, identifying discontinuities in cluster distributions (breaks), and quantifying integrated break densities (break zones). Discontinuities were identified based on the intra-specific genetic variation of 12 species and on the floristic distribution data from 239 species, respectively. Coincidence between the two types of break zones was tested using Spearman's correlation. Break zone densities were also regressed on topographical complexity to test for the effect of elevational variation.Results We found that two main break zones in the distribution of alleles and species were significantly correlated. Furthermore, we show that these break zones are in topographically complex regions, characterized by massive elevational ranges owing to high mountains and deep glacial valleys. We detected a third break zone in the distribution of species in the eastern Alps, which is not correlated with topographic complexity, and which is also not evident from allelic distribution patterns. Species with the potential for long-distance dispersal tended to show larger distribution ranges than short-distance dispersers.Main conclusions We suggest that the history of Pleistocene glaciations is the main driver of the congruence between allele-based and species-based distribution patterns, because occurrences of both species and alleles were subject to the same processes (such as extinction, migration and drift) that shaped the distributions of species and genetic lineages. Large elevational ranges have had a profound effect as a dispersal barrier for alleles during post-glacial immigration. Because plant species, unlike alleles, cannot spread via pollen but only via seed, and thus disperse less effectively, we conclude that species break zones are maintained over longer time spans and reflect more ancient patterns than allele break zones.Conny Thiel-Egenter and Nadir Alvarez contributed equally to this paper and are considered joint first authors.
Resumo:
Studying patterns of species distributions along elevation gradients is frequently used to identify the primary factors that determine the distribution, diversity and assembly of species. However, despite their crucial role in ecosystem functioning, our understanding of the distribution of below-ground fungi is still limited, calling for more comprehensive studies of fungal biogeography along environmental gradients at various scales (from regional to global). Here, we investigated the richness of taxa of soil fungi and their phylogenetic diversity across a wide range of grassland types along a 2800 m elevation gradient at a large number of sites (213), stratified across a region of the Western Swiss Alps (700 km(2)). We used 454 pyrosequencing to obtain fungal sequences that were clustered into operational taxonomic units (OTUs). The OTU diversity-area relationship revealed uneven distribution of fungal taxa across the study area (i.e. not all taxa are everywhere) and fine-scale spatial clustering. Fungal richness and phylogenetic diversity were found to be higher in lower temperatures and higher moisture conditions. Climatic and soil characteristics as well as plant community composition were related to OTU alpha, beta and phylogenetic diversity, with distinct fungal lineages suggesting distinct ecological tolerances. Soil fungi, thus, show lineage-specific biogeographic patterns, even at a regional scale, and follow environmental determinism, mediated by interactions with plants.
Resumo:
The quality of environmental data analysis and propagation of errors are heavily affected by the representativity of the initial sampling design [CRE 93, DEU 97, KAN 04a, LEN 06, MUL07]. Geostatistical methods such as kriging are related to field samples, whose spatial distribution is crucial for the correct detection of the phenomena. Literature about the design of environmental monitoring networks (MN) is widespread and several interesting books have recently been published [GRU 06, LEN 06, MUL 07] in order to clarify the basic principles of spatial sampling design (monitoring networks optimization) based on Support Vector Machines was proposed. Nonetheless, modelers often receive real data coming from environmental monitoring networks that suffer from problems of non-homogenity (clustering). Clustering can be related to the preferential sampling or to the impossibility of reaching certain regions.
Resumo:
Tämän diplomityön tarkoituksena on tutkia, mitä vaaditaan uutisten samanlaisuuden automaattiseen tunnistamiseen. Uutiset ovat tekstipohjaisia uutisia, jotka on haettu eri uutislähteistä. Uutisista on tarkoitus tunnistaa ensinnäkin ne uutiset, jotka tarkoittavat samaa asiaa, sekä ne uutiset, jotka eivät ole aivan sama asia, mutta liittyvät kuitenkin toisiinsa. Tässä diplomityössä tutkitaan, millä algoritmeilla tämä tunnistus onnistuu tehokkaimmin sekä suomalaisessa, että englanninkielisessä tekstissä. Diplomityössä vertaillaan valmiita algoritmeja. Tavoitteena on valita sellainen algoritmiyhdistelmä, että 90 % vertailluista uutisista tunnistuu oikein. Tutkimuksessa käytetään 2 eri ryhmittelyalgoritmia, sekä 3 eri stemmaus-algoritmia. Näitä algoritmeja vertaillaan sekä uutisten tunnistustehokkuuden, että niiden suorituskyvyn suhteen. Parhaimmaksi stemmaus-algoritmiksi osoittautui sekä suomen-, että englanninkielisten uutisten vertailussa Porterin algoritmi. Ryhmittely-algoritmeista tehokkaammaksi osoittautui yksinkertaisempi erilaisiin tunnuslukuihin perustuva algoritmi.
Resumo:
We present a new global method for the identification of hotspots in conservation and ecology. The method is based on the identification of spatial structure properties through cumulative relative frequency distributions curves, and is tested with two case studies, the identification of fish density hotspots and terrestrial vertebrate species diversity hotspots. Results from the frequency distribution method are compared with those from standard techniques among local, partially local and global methods. Our approach offers the main advantage to be independent from the selection of any threshold, neighborhood, or other parameter that affect most of the currently available methods for hotspot analysis. The two case studies show how such elements of arbitrariness of the traditional methods influence both size and location of the identified hotspots, and how this new global method can be used for a more objective selection of hotspots.
Resumo:
PURPOSE: The natural history of prostate cancer might be driven by the index lesion. We determined the percent of men in whom the index lesion could be defined using transperineal template prostate mapping biopsies. MATERIALS AND METHODS: Included in study were consecutive men undergoing transperineal template prostate mapping biopsies with biopsies grouped into 20 zones. Men with clinically significant disease in only 1 prostate area were considered to have an identifiable index lesion. We evaluated the impact of using 2 definitions of clinically significant disease (Gleason grade pattern 4 and/or lesion volume 0.5 cc or greater) and 2 clustering rules (stringent and tolerant) to define the index lesion. RESULTS: Included in study were 391 men with a median age of 62 years (IQR 58-67) and a median prostate specific antigen of 6.9 ng/ml (IQR 4.8-10.0). Of the men 269 (69%) were previously diagnosed with prostate cancer. By deploying a median of 1.2 cores per ml (IQR 0.9-1.7) cancer was diagnosed in 82.9% of the men (324 of 391) with a median of 6 positive cores (IQR 2-9), a median maximum cancer core length of 5 mm (IQR 3-8) and a total cancer core length per zone of 7 mm (IQR 3-13). Insignificant disease was found in 26.3% to 42.9% of cases. When a stringent spatial relationship was used to define individual lesions, 44.4% to 54.6% of patients had 1 index lesion and 12.7% to 19.1% had more than 1 area with clinically significant disease. These proportions changed to 46.6% to 59.2% and 10.5% to 14.5%, respectively, when less stringent spatial clustering was applied. CONCLUSIONS: Transperineal template prostate mapping biopsies enable the index lesion to be localized in most men with clinically significant disease. This information may be important to select appropriate candidates for targeted therapy and to plan a tailored treatment strategy in men undergoing radical therapy.
Resumo:
In image processing, segmentation algorithms constitute one of the main focuses of research. In this paper, new image segmentation algorithms based on a hard version of the information bottleneck method are presented. The objective of this method is to extract a compact representation of a variable, considered the input, with minimal loss of mutual information with respect to another variable, considered the output. First, we introduce a split-and-merge algorithm based on the definition of an information channel between a set of regions (input) of the image and the intensity histogram bins (output). From this channel, the maximization of the mutual information gain is used to optimize the image partitioning. Then, the merging process of the regions obtained in the previous phase is carried out by minimizing the loss of mutual information. From the inversion of the above channel, we also present a new histogram clustering algorithm based on the minimization of the mutual information loss, where now the input variable represents the histogram bins and the output is given by the set of regions obtained from the above split-and-merge algorithm. Finally, we introduce two new clustering algorithms which show how the information bottleneck method can be applied to the registration channel obtained when two multimodal images are correctly aligned. Different experiments on 2-D and 3-D images show the behavior of the proposed algorithms