923 resultados para spatial clustering algorithms


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This article studies the broader problem of approximate clone detection in process models. The article proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hi- erarchical Agglomerative Clustering (HAC). The article also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Transit passenger market segmentation enables transit operators to target different classes of transit users for targeted surveys and various operational and strategic planning improvements. However, the existing market segmentation studies in the literature have been generally done using passenger surveys, which have various limitations. The smart card (SC) data from an automated fare collection system facilitate the understanding of the multiday travel pattern of transit passengers and can be used to segment them into identifiable types of similar behaviors and needs. This paper proposes a comprehensive methodology for passenger segmentation solely using SC data. After reconstructing the travel itineraries from SC transactions, this paper adopts the density-based spatial clustering of application with noise (DBSCAN) algorithm to mine the travel pattern of each SC user. An a priori market segmentation approach then segments transit passengers into four identifiable types. The methodology proposed in this paper assists transit operators to understand their passengers and provides them oriented information and services.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

- Objective We sought to assess the effect of long-term exposure to ambient air pollution on the prevalence of self-reported health outcomes in Australian women. - Design Cross-sectional study - Setting and participants The geocoded residential addresses of 26 991 women across 3 age cohorts in the Australian Longitudinal Study on Women's Health between 2006 and 2011 were linked to nitrogen dioxide (NO2) exposure estimates from a land-use regression model. Annual average NO2 concentrations and residential proximity to roads were used as proxies of exposure to ambient air pollution. - Outcome measures Self-reported disease presence for diabetes mellitus, heart disease, hypertension, stroke, asthma, chronic obstructive pulmonary disease and self-reported symptoms of allergies, breathing difficulties, chest pain and palpitations. - Methods Disease prevalence was modelled by population-averaged Poisson regression models estimated by generalised estimating equations. Associations between symptoms and ambient air pollution were modelled by multilevel mixed logistic regression. Spatial clustering was accounted for at the postcode level. - Results No associations were observed between any of the outcome and exposure variables considered at the 1% significance level after adjusting for known risk factors and confounders. - Conclusions Long-term exposure to ambient air pollution was not associated with self-reported disease prevalence in Australian women. The observed results may have been due to exposure and outcome misclassification, lack of power to detect weak associations or an actual absence of associations with self-reported outcomes at the relatively low annual average air pollution exposure levels across Australia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The similar to 2500 km-long Himalaya plate boundary experienced three great earthquakes during the past century, but none of them generated any surface rupture. The segments between the 1905-1934 and the 1897-1950 sources, known as the central and Assam seismic gaps respectively, have long been considered holding potential for future great earthquakes. This paper addresses two issues concerning earthquakes along the Himalaya plate boundary. One, the absence of surface rupture associated with the great earthquakes, vis-a-vis the purported large slip observed from paleoseismological investigations and two, the current understanding of the status of the seismic gaps in the Central Himalaya and Assam, in view of the paleoseismological and historical data being gathered. We suggest that the ruptures of earthquakes nucleating on the basal detachment are likely to be restricted by the crustal ramps and thus generate no surface ruptures, whereas those originating on the faults within the wedges promote upward propagation of rupture and displacement, as observed during the 2005 Kashmir earthquake, that showed a peak offset of 7 m. The occasional reactivation of these thrust systems within the duplex zone may also be responsible for the observed temporal and spatial clustering of earthquakes in the Himalaya. Observations presented in this paper suggest that the last major earthquake in the Central Himalaya occurred during AD 1119-1292, rather than in 1505, as suggested in some previous studies and thus the gap in the plate boundary events is real. As for the Northwestern Himalaya, seismically generated sedimentary features identified in the 1950 source region are generally younger than AD 1400 and evidence for older events is sketchy. The 1897 Shillong earthquake is not a decollement event and its predecessor is probably similar to 1000 years old. Compared to the Central Himalaya, the Assam Gap is a corridor of low seismicity between two tectonically independent seismogenic source zones that cannot be considered as a seismic gap in the conventional sense. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The objective of this study was to describe the physical and ichthyological changes occurring seasonally and annually in the south San Francisco Bay, based on the results of 2,561 otter trawl and water samples obtained between February 1973 and June 1982. Temperature varied predictably among seasons in a pattern that varied little between years. Salinity also underwent predictable seasonal changes but the pattern varied substantially between years. The most abundant species of fish were northern anchovy (Engraulis mordax), English sole (Parophrys vetulus), and shiner surfperch (Cymatogaster aggregata). The majority of the common fish species were most abundant during wet years and least abundant in dry years. Numeric diversity was highest during the spring and early summer, with no detectable interannual trends. Species composition changed extensively between seasons and between years, particularly years with extremely high or extremely low freshwater inflows. All the common species exhibited clustered spatial distributions. Such spatial clustering could affect the interpretation of data from estuarine sampling programs. Gobies (Family Gobiidae) were more abundant during flood tides than during ebb tides. English sole were significantly more abundant in shallower areas. Shiner surfperch showed significant differences in abundance between sample areas.(PDF file contains 28 pages.)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample ... ) belongs to one of these previously identified clusters or to a new group. Results: ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions: We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices with both methodologies and concluded that the results obtained with the proposed methodology are more representative of the actual capabilities of the compared indices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fontes radioativas possuem radionuclídeos. Um radionuclídeo é um átomo com um núcleo instável, ou seja, um núcleo caracterizado pelo excesso de energia que está disponível para ser emitida. Neste processo, o radionuclídeo sofre o decaimento radioativo e emite raios gama e partículas subatômicas, constituindo-se na radiação ionizante. Então, a radioatividade é a emissão espontânea de energia a partir de átomos instáveis. A identificação correta de radionuclídeos pode ser crucial para o planejamento de medidas de proteção, especialmente em situações de emergência, definindo o tipo de fonte de radiação e seu perigo radiológico. Esta dissertação apresenta a aplicação do método de agrupamento subtrativo, implementada em hardware, para um sistema de identificação de elementos radioativos com uma resposta rápida e eficiente. Quando implementados em software, os algoritmos de agrupamento consumem muito tempo de processamento. Assim, uma implementação dedicada para hardware reconfigurável é uma boa opção em sistemas embarcados, que requerem execução em tempo real, bem como baixo consumo de energia. A arquitetura proposta para o hardware de cálculo do agrupamento subtrativo é escalável, permitindo a inclusão de mais unidades de agrupamento subtrativo para operarem em paralelo. Isso proporciona maior flexibilidade para acelerar o processo de acordo com as restrições de tempo e de área. Os resultados mostram que o centro do agrupamento pode ser identificado com uma boa eficiência. A identificação desses pontos pode classificar os elementos radioativos presentes em uma amostra. Utilizando este hardware foi possível identificar mais do que um centro de agrupamento, o que permite reconhecer mais de um radionuclídeo em fontes radioativas. Estes resultados revelam que o hardware proposto pode ser usado para desenvolver um sistema portátil para identificação radionuclídeos.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clare, A. and King R.D. (2002) How well do we understand the clusters found in microarray data? In In Silico Biol. 2, 0046

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Semi-autonomous avatars should be both realistic and believable. The goal is to learn from and reproduce the behaviours of the user-controlled input to enable semi-autonomous avatars to plausibly interact with their human-controlled counterparts. A powerful tool for embedding autonomous behaviour is learning by imitation. Hence, in this paper an ensemble of fuzzy inference systems cluster the user input data to identify natural groupings within the data to describe the users movement and actions in a more abstract way. Multiple clustering algorithms are investigated along with a neuro-fuzzy classifier; and an ensemble of fuzzy systems are evaluated.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes a methodology that was developed for the classification of Medium Voltage (MV) electricity customers. Starting from a sample of data bases, resulting from a monitoring campaign, Data Mining (DM) techniques are used in order to discover a set of a MV consumer typical load profile and, therefore, to extract knowledge regarding to the electric energy consumption patterns. In first stage, it was applied several hierarchical clustering algorithms and compared the clustering performance among them using adequacy measures. In second stage, a classification model was developed in order to allow classifying new consumers in one of the obtained clusters that had resulted from the previously process. Finally, the interpretation of the discovered knowledge are presented and discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent decades, all over the world, competition in the electric power sector has deeply changed the way this sector’s agents play their roles. In most countries, electric process deregulation was conducted in stages, beginning with the clients of higher voltage levels and with larger electricity consumption, and later extended to all electrical consumers. The sector liberalization and the operation of competitive electricity markets were expected to lower prices and improve quality of service, leading to greater consumer satisfaction. Transmission and distribution remain noncompetitive business areas, due to the large infrastructure investments required. However, the industry has yet to clearly establish the best business model for transmission in a competitive environment. After generation, the electricity needs to be delivered to the electrical system nodes where demand requires it, taking into consideration transmission constraints and electrical losses. If the amount of power flowing through a certain line is close to or surpasses the safety limits, then cheap but distant generation might have to be replaced by more expensive closer generation to reduce the exceeded power flows. In a congested area, the optimal price of electricity rises to the marginal cost of the local generation or to the level needed to ration demand to the amount of available electricity. Even without congestion, some power will be lost in the transmission system through heat dissipation, so prices reflect that it is more expensive to supply electricity at the far end of a heavily loaded line than close to an electric power generation. Locational marginal pricing (LMP), resulting from bidding competition, represents electrical and economical values at nodes or in areas that may provide economical indicator signals to the market agents. This article proposes a data-mining-based methodology that helps characterize zonal prices in real power transmission networks. To test our methodology, we used an LMP database from the California Independent System Operator for 2009 to identify economical zones. (CAISO is a nonprofit public benefit corporation charged with operating the majority of California’s high-voltage wholesale power grid.) To group the buses into typical classes that represent a set of buses with the approximate LMP value, we used two-step and k-means clustering algorithms. By analyzing the various LMP components, our goal was to extract knowledge to support the ISO in investment and network-expansion planning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.