931 resultados para k-Means algorithm
Resumo:
Recently there has been a considerable interest in dynamic textures due to the explosive growth of multimedia databases. In addition, dynamic texture appears in a wide range of videos, which makes it very important in applications concerning to model physical phenomena. Thus, dynamic textures have emerged as a new field of investigation that extends the static or spatial textures to the spatio-temporal domain. In this paper, we propose a novel approach for dynamic texture segmentation based on automata theory and k-means algorithm. In this approach, a feature vector is extracted for each pixel by applying deterministic partially self-avoiding walks on three orthogonal planes of the video. Then, these feature vectors are clustered by the well-known k-means algorithm. Although the k-means algorithm has shown interesting results, it only ensures its convergence to a local minimum, which affects the final result of segmentation. In order to overcome this drawback, we compare six methods of initialization of the k-means. The experimental results have demonstrated the effectiveness of our proposed approach compared to the state-of-the-art segmentation methods.
Resumo:
A non-hierarchical K-means algorithm is used to cluster 47 years (1960–2006) of 10-day HYSPLIT backward trajectories to the Pico Mountain (PM) observatory on a seasonal basis. The resulting cluster centers identify the major transport pathways and collectively comprise a long-term climatology of transport to the observatory. The transport climatology improves our ability to interpret the observations made there and our understanding of pollution source regions to the station and the central North Atlantic region. I determine which pathways dominate transport to the observatory and examine the impacts of these transport patterns on the O3, NOy, NOx, and CO measurements made there during 2001–2006. Transport from the U.S., Canada, and the Atlantic most frequently reaches the station, but Europe, east Africa, and the Pacific can also contribute significantly depending on the season. Transport from Canada was correlated with the North Atlantic Oscillation (NAO) in spring and winter, and transport from the Pacific was uncorrelated with the NAO. The highest CO and O3 are observed during spring. Summer is also characterized by high CO and O3 and the highest NOy and NOx of any season. Previous studies at the station attributed the summer time high CO and O3 to transport of boreal wildfire emissions (for 2002–2004), and boreal fires continued to affect the station during 2005 and 2006. The particle dispersion model FLEXPART was used to calculate anthropogenic and biomass-burning CO tracer values at the station in an attempt to identify the regions responsible for the high CO and O3 observations during spring and biomass-burning impacts in summer.
Resumo:
Magnetic resonance temperature imaging (MRTI) is recognized as a noninvasive means to provide temperature imaging for guidance in thermal therapies. The most common method of estimating temperature changes in the body using MR is by measuring the water proton resonant frequency (PRF) shift. Calculation of the complex phase difference (CPD) is the method of choice for measuring the PRF indirectly since it facilitates temperature mapping with high spatiotemporal resolution. Chemical shift imaging (CSI) techniques can provide the PRF directly with high sensitivity to temperature changes while minimizing artifacts commonly seen in CPD techniques. However, CSI techniques are currently limited by poor spatiotemporal resolution. This research intends to develop and validate a CSI-based MRTI technique with intentional spectral undersampling which allows relaxed parameters to improve spatiotemporal resolution. An algorithm based on autoregressive moving average (ARMA) modeling is developed and validated to help overcome limitations of Fourier-based analysis allowing highly accurate and precise PRF estimates. From the determined acquisition parameters and ARMA modeling, robust maps of temperature using the k-means algorithm are generated and validated in laser treatments in ex vivo tissue. The use of non-PRF based measurements provided by the technique is also investigated to aid in the validation of thermal damage predicted by an Arrhenius rate dose model.
Resumo:
We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Ground Delay Programs (GDP) are sometimes cancelled before their initial planned duration and for this reason aircraft are delayed when it is no longer needed. Recovering this delay usually leads to extra fuel consumption, since the aircraft will typically depart after having absorbed on ground their assigned delay and, therefore, they will need to cruise at more fuel consuming speeds. Past research has proposed speed reduction strategy aiming at splitting the GDP-assigned delay between ground and airborne delay, while using the same fuel as in nominal conditions. Being airborne earlier, an aircraft can speed up to nominal cruise speed and recover part of the GDP delay without incurring extra fuel consumption if the GDP is cancelled earlier than planned. In this paper, all GDP initiatives that occurred in San Francisco International Airport during 2006 are studied and characterised by a K-means algorithm into three different clusters. The centroids for these three clusters have been used to simulate three different GDPs at the airport by using a realistic set of inbound traffic and the Future Air Traffic Management Concepts Evaluation Tool (FACET). The amount of delay that can be recovered using this cruise speed reduction technique, as a function of the GDP cancellation time, has been computed and compared with the delay recovered with the current concept of operations. Simulations have been conducted in calm wind situation and without considering a radius of exemption. Results indicate that when aircraft depart early and fly at the slower speed they can recover additional delays, compared to current operations where all delays are absorbed prior to take-off, in the event the GDP cancels early. There is a variability of extra delay recovered, being more significant, in relative terms, for those GDPs with a relatively low amount of demand exceeding the airport capacity.
Resumo:
Radial basis functions can be combined into a network structure that has several advantages over conventional neural network solutions. However, to operate effectively the number and positions of the basis function centres must be carefully selected. Although no rigorous algorithm exists for this purpose, several heuristic methods have been suggested. In this paper a new method is proposed in which radial basis function centres are selected by the mean-tracking clustering algorithm. The mean-tracking algorithm is compared with k means clustering and it is shown that it achieves significantly better results in terms of radial basis function performance. As well as being computationally simpler, the mean-tracking algorithm in general selects better centre positions, thus providing the radial basis functions with better modelling accuracy
Resumo:
A fast backward elimination algorithm is introduced based on a QR decomposition and Givens transformations to prune radial-basis-function networks. Nodes are sequentially removed using an increment of error variance criterion. The procedure is terminated by using a prediction risk criterion so as to obtain a model structure with good generalisation properties. The algorithm can be used to postprocess radial basis centres selected using a k-means routine and, in this mode, it provides a hybrid supervised centre selection approach.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Industrial applications of computer vision sometimes require detection of atypical objects that occur as small groups of pixels in digital images. These objects are difficult to single out because they are small and randomly distributed. In this work we propose an image segmentation method using the novel Ant System-based Clustering Algorithm (ASCA). ASCA models the foraging behaviour of ants, which move through the data space searching for high data-density regions, and leave pheromone trails on their path. The pheromone map is used to identify the exact number of clusters, and assign the pixels to these clusters using the pheromone gradient. We applied ASCA to detection of microcalcifications in digital mammograms and compared its performance with state-of-the-art clustering algorithms such as 1D Self-Organizing Map, k-Means, Fuzzy c-Means and Possibilistic Fuzzy c-Means. The main advantage of ASCA is that the number of clusters needs not to be known a priori. The experimental results show that ASCA is more efficient than the other algorithms in detecting small clusters of atypical data.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
Remotely sensed imagery has been widely used for land use/cover classification thanks to the periodic data acquisition and the widespread use of digital image processing systems offering a wide range of classification algorithms. The aim of this work was to evaluate some of the most commonly used supervised and unsupervised classification algorithms under different landscape patterns found in Rondônia, including (1) areas of mid-size farms, (2) fish-bone settlements and (3) a gradient of forest and Cerrado (Brazilian savannah). Comparison with a reference map based on the kappa statistics resulted in good to superior indicators (best results - K-means: k=0.68; k=0.77; k=0.64 and MaxVer: k=0.71; k=0.89; k=0.70 respectively for three areas mentioned). Results show that choosing a specific algorithm requires to take into account both its capacity to discriminate among various spectral signatures under different landscape patterns as well as a cost/benefit analysis considering the different steps performed by the operator performing a land cover/use map. it is suggested that a more systematic assessment of several options of implementation of a specific project is needed prior to beginning a land use/cover mapping job.
Resumo:
Understanding the ecological role of benthic microalgae, a highly productive component of coral reef ecosystems, requires information on their spatial distribution. The spatial extent of benthic microalgae on Heron Reef (southern Great Barrier Reef, Australia) was mapped using data from the Landsat 5 Thematic Mapper sensor. integrated with field measurements of sediment chlorophyll concentration and reflectance. Field-measured sediment chlorophyll concentrations. 2 ranging from 23-1.153 mg chl a m(2), were classified into low, medium, and high concentration classes (1-170, 171-290, and > 291 mg chl a m(-2)) using a K-means clustering algorithm. The mapping process assumed that areas in the Thematic Mapper image exhibiting similar reflectance levels in red and blue bands would correspond to areas of similar chlorophyll a levels. Regions of homogenous reflectance values corresponding to low, medium, and high chlorophyll levels were identified over the reef sediment zone by applying a standard image classification algorithm to the Thematic Mapper image. The resulting distribution map revealed large-scale ( > 1 km 2) patterns in chlorophyll a levels throughout the sediment zone of Heron Reef. Reef-wide estimates of chlorophyll a distribution indicate that benthic Microalgae may constitute up to 20% of the total benthic chlorophyll a at Heron Reef. and thus contribute significantly to total primary productivity on the reef.
Resumo:
A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.