819 resultados para Hybrid clustering algorithm
Resumo:
K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.
Resumo:
The Capacitated Centered Clustering Problem (CCCP) consists of defining a set of p groups with minimum dissimilarity on a network with n points. Demand values are associated with each point and each group has a demand capacity. The problem is well known to be NP-hard and has many practical applications. In this paper, the hybrid method Clustering Search (CS) is implemented to solve the CCCP. This method identifies promising regions of the search space by generating solutions with a metaheuristic, such as Genetic Algorithm, and clustering them into clusters that are then explored further with local search heuristics. Computational results considering instances available in the literature are presented to demonstrate the efficacy of CS. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
A Software-as-a-Service or SaaS can be delivered in a composite form, consisting of a set of application and data components that work together to deliver higher-level functional software. Components in a composite SaaS may need to be scaled – replicated or deleted, to accommodate the user’s load. It may not be necessary to replicate all components of the SaaS, as some components can be shared by other instances. On the other hand, when the load is low, some of the instances may need to be deleted to avoid resource underutilisation. Thus, it is important to determine which components are to be scaled such that the performance of the SaaS is still maintained. Extensive research on the SaaS resource management in Cloud has not yet addressed the challenges of scaling process for composite SaaS. Therefore, a hybrid genetic algorithm is proposed in which it utilises the problem’s knowledge and explores the best combination of scaling plan for the components. Experimental results demonstrate that the proposed algorithm outperforms existing heuristic-based solutions.
Resumo:
This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.
Resumo:
With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%.
Resumo:
Web service composition is an important problem in web service based systems. It is about how to build a new value-added web service using existing web services. A web service may have many implementations, all of which have the same functionality, but may have different QoS values. Thus, a significant research problem in web service composition is how to select a web service implementation for each of the web services such that the composite web service gives the best overall performance. This is so-called optimal web service selection problem. There may be mutual constraints between some web service implementations. Sometimes when an implementation is selected for one web service, a particular implementation for another web service must be selected. This is so called dependency constraint. Sometimes when an implementation for one web service is selected, a set of implementations for another web service must be excluded in the web service composition. This is so called conflict constraint. Thus, the optimal web service selection is a typical constrained ombinatorial optimization problem from the computational point of view. This paper proposes a new hybrid genetic algorithm for the optimal web service selection problem. The hybrid genetic algorithm has been implemented and evaluated. The evaluation results have shown that the hybrid genetic algorithm outperforms other two existing genetic algorithms when the number of web services and the number of constraints are large.
Resumo:
In the real world there are many problems in network of networks (NoNs) that can be abstracted to a so-called minimum interconnection cut problem, which is fundamentally different from those classical minimum cut problems in graph theory. Thus, it is desirable to propose an efficient and effective algorithm for the minimum interconnection cut problem. In this paper we formulate the problem in graph theory, transform it into a multi-objective and multi-constraint combinatorial optimization problem, and propose a hybrid genetic algorithm (HGA) for the problem. The HGA is a penalty-based genetic algorithm (GA) that incorporates an effective heuristic procedure to locally optimize the individuals in the population of the GA. The HGA has been implemented and evaluated by experiments. Experimental results have shown that the HGA is effective and efficient.
Resumo:
Server consolidation using virtualization technology has become an important technology to improve the energy efficiency of data centers. Virtual machine placement is the key in the server consolidation technology. In the past few years, many approaches to the virtual machine placement have been proposed. However, existing virtual machine placement approaches consider the energy consumption by physical machines only, but do not consider the energy consumption in communication network, in a data center. However, the energy consumption in the communication network in a data center is not trivial, and therefore should be considered in the virtual machine placement. In our preliminary research, we have proposed a genetic algorithm for a new virtual machine placement problem that considers the energy consumption in both physical machines and the communication network in a data center. Aiming at improving the performance and efficiency of the genetic algorithm, this paper presents a hybrid genetic algorithm for the energy-efficient virtual machine placement problem. Experimental results show that the hybrid genetic algorithm significantly outperforms the original genetic algorithm, and that the hybrid genetic algorithm is scalable.
Resumo:
This paper presents an efficient hybrid evolutionary optimization algorithm based on combining Ant Colony Optimization (ACO) and Simulated Annealing (SA), called ACO-SA, for distribution feeder reconfiguration (DFR) considering Distributed Generators (DGs). Due to private ownership of DGs, a cost based compensation method is used to encourage DGs in active and reactive power generation. The objective function is summation of electrical energy generated by DGs and substation bus (main bus) in the next day. The approach is tested on a real distribution feeder. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for solving DFR problem.
Resumo:
This paper deals with an efficient hybrid evolutionary optimization algorithm in accordance with combining the ant colony optimization (ACO) and the simulated annealing (SA), so called ACO-SA. The distribution feeder reconfiguration (DFR) is known as one of the most important control schemes in the distribution networks, which can be affected by distributed generations (DGs) for the multi-objective DFR. In such a case, DGs is used to minimize the real power loss, the deviation of nodes voltage and the number of switching operations. The approach is carried out on a real distribution feeder, where the simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for solving the DFR problem.
Resumo:
This paper presents a computational method for eliminating severe stress concentration at the unsupported railhead ends in rail joints through innovative shape optimization of the contact zone, which is complex due to near field nonlinear contact. With a view to minimizing the computational efforts, hybrid genetic algorithm method coupled with parametric finite element has been developed and compared with the traditional genetic algorithm (GA). The shape of railhead top surface where the wheel contacts nonlinearly was optimized using the hybridized GA method. Comparative study of the optimal result and the search efficiency between the traditional and hybrid GA methods has shown that the hybridized GA provides the optimal shape in fewer computational cycles without losing accuracy. The method will be beneficial to solving complex engineering problems involving contact nonlinearity.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.