160 resultados para Evolutionary clustering


20.00% 20.00%



We propose a new data induced metric to perform un supervised data classification (clustering). Our goal is to automatically recognize clusters of non-convex shape. We present a new version of fuzzy c-means al gorithm, based on the data induced metric, which is capable to identify non-convex d-dimensional clusters.


20.00% 20.00%



There exist multiple objectives in engineering management such as minimum cost and maximum service capacity. Although solution methods of multiobjective optimization problems have undergone continual development over the past several decades, the methods available to date are not particularly robust, and none of them performs well on the broad classes. Because genetic algorithms work with a population of points, they can capture a number of solutions simultaneously, and easily incorporate the concept of Pareto optimal set in their optimization process. In this paper, a genetic algorithm is modified to deal with the rehabilitation planning of bridge decks at a network level by minimizing the rehabilitation cost and deterioration degree simultaneously.


20.00% 20.00%



Generally multiple objectives exist in transportation infrastructure management, such as minimum cost and maximum service capacity. Although solution methoak of multiobjective optimization problems have undergone continual development over the part several decades, the methods available to date are not particularly robust, and none of them perform well on the broad classes. Because genetic algorithms work with apopulation ofpoints, they can capture a number of solutions simultaneously, and easily incorporate the concept of a Pareto optimal set in their optimization process. In this paper, a genetic algorithm is modified to deal with an empirical application for the rehabilitation planning of bridge decks, at a network level, by minimizing the rehabilitation cost and deterioration degree simultaneously.


20.00% 20.00%



For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further  empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.


20.00% 20.00%



Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.


20.00% 20.00%



Clustering is widely used in bioinformatics to find gene correlation patterns. Although many algorithms have been proposed, these are usually confronted with difficulties in meeting the requirements of both automation and high quality. In this paper, we propose a novel algorithm for clustering genes from their expression profiles. The unique features of the proposed algorithm are twofold: it takes into consideration global, rather than local, gene correlation information in clustering processes; and it incorporates clustering quality measurement into the clustering processes to implement non-parametric, automatic and global optimal gene clustering. The evaluation on simulated and real gene data sets demonstrates the effectiveness of the algorithm.


20.00% 20.00%



Email overload is a recent problem that there is increasingly difficulty people have faced to process the large number of emails received daily. Currently this problem becomes more and more serious and it has already affected the normal usage of email as a knowledge management tool. It has been recognized that categorizing emails into meaningful groups can greatly save cognitive load to process emails and thus this is an effective way to manage email overload problem. However, most current approaches still require significant human input when categorizing emails. In this paper we develop an automatic email clustering system, underpinned by a new nonparametric text clustering algorithm. This system does not require any predefined input parameters and can automatically generate meaningful email clusters. Experiments show our new algorithm outperforms existing text clustering algorithms with higher efficiency in terms of computational time and clustering quality measured by different gauges.


20.00% 20.00%



This paper describes the methodology for identifying moving obstacles by obtaining a reliable and a sparse optical flow from image sequences. Given a sequence of images, basically we can detect two-types of on road vehicles, vehicles traveling in the opposite direction and vehicles traveling in the same direction. For both types, distinct feature points can be detected by Shi and Tomasi corner detector algorithm. Then pyramidal Lucas Kanade method for optical flow calculation is used to match the sparse feature set of one frame on the consecutive frame. By applying k means clustering on four component feature vector, which are to be the coordinates of the feature point and the two components of the optical flow, we can easily calculate the centroids of the clusters and the objects can be easily tracked. The vehicles traveling in the opposite direction produce a diverging vector field, while vehicles traveling in the same direction produce a converging vector field


20.00% 20.00%



For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.


20.00% 20.00%



Isogramma manchoukuoensis from the Upper Carboniferous of northeast China is redefined based on re-examination of the type specimens. Isogramma specimens from the Middle Permian of northeastern Japan are reassigned to I. aff. paotechowensis. A new family, Schizopleuroniidae, is proposed to include Schizopleuronia, but excludes Megapleuronia, which belongs to the Megapleuroniidae Liao, 1983. The family Isogrammidae is considered to be a transitional group in the eichwaldid-isogrammid-schizopleuronid evolutionary lineage within the Dictyonellida. A review of the global distribution of Isogramma species reveals that the genus has a total of 56 species ranging from the Mississippian (Early Carboniferous) to the Lopingian (Late Permian). Isogramma diversified rapidly after its origination in the middle Viséan and its species diversity remained high throughout the Mississippian. The genus possibly suffered a severe mid-Carboniferous boundary mass extinction, with no Early Carboniferous species surviving this event. Bashkirian Isogramma species show low diversity, followed by a global recovery in the Moscovian. During the latest Carboniferous Isogramma became highly diversified again. At the Carboniferous–Permian (C/P) transition Isogramma underwent another dramatic diversity drop, followed by several stepwise declines in diversity during the Early–Middle Permian. The Wuchiapingian I. sinosa is the last Isogramma species.

Ukraine was the possible centre of origin for Isogramma. From Ukraine Isogramma spread over the Moscow Basin of Russia, Central Europe (Germany, Austria), South Europe (Spain) and West Europe (England, Ireland and Scotland), and migrated to the North American midcontinent and South China during the late Viséan (Early Carboniferous). In Europe, Isogramma migrated to Spain and eastern Europe (Serbia) in the Moscovian, from there it then dispersed into Central Asia (Uzbekistan and Kazakhstan) in the Kasimovian-Gzhelian. In the Palaeo-Tethys Isogramma migrated from South China to northeast and northwest China in the Moscovian, spread over the North China Block during the C/P transition, moved to Russian Siberia, Japan and the Qiangtang terrane of the Palaeo-Tethys during the Early–Late Permian. In North America Isogramma spread over the midcontinent during the Late Carboniferous and Early–Middle Permian and migrated to South America (Bolivia) in latest Carboniferous. Biogeographically, Isogramma was confined principally to the palaeo-tropical and warm to temperate zones throughout the Late Palaeozoic, with the possible exception of the Artinskian, as a questionable species of the genus also occurs in the Transbaikal region of southeast Russia.


20.00% 20.00%



This paper presents an algorithm based on the Growing Self Organizing Map (GSOM) called the High Dimensional Growing Self Organizing Map with Randomness (HDGSOMr) that can cluster massive high dimensional data efficiently. The original GSOM algorithm is altered to accommodate for the issues related to massive high dimensional data. These modifications are presented in detail with experimental results of a massive real-world dataset.


20.00% 20.00%



In sexually dimorphic ungulates, sexual segregation is hypothesized to have evolved because of sex-specific differences in body size and/or reproductive strategies. We tested these alternative hypotheses in kangaroos, which are ecological analogues of ungulates. Kangaroos exhibit a wide range of body sizes, particularly among mature males, and so the effects of body size and sex can be distinguished. We tested predictions derived from these hypotheses by comparing the distribution of three sex–sex size classes of western grey kangaroos Macropus fuliginosus, in different habitats, and the composition of groups of kangaroos, across seasons. In accordance with the predation risk-reproductive strategy hypothesis, during the non-breeding season, females, which were more susceptible to predation than larger males, and were accompanied by vulnerable young-at-foot, were over-represented in secure habitats. Large males, which were essentially immune to predation, occurred more often than expected in nutrient-rich habitat, and small males, which faced competing demands of predator avoidance and feeding, were intermediate between females and large males in their distribution across habitats. During the breeding season, females continued to be over-represented in secure habitats when their newly emerged pouch young were most vulnerable to predation. All males occupied these same habitats to maximize their chances of securing mates. Consistent with the social hypotheses, groups composed of individuals of the same sex, irrespective of body size, were over-represented in the population during the non-breeding season, while during the breeding season all males sought females so that mixed-sex groups predominated. These results indicate that body size and reproductive strategies are both important, yet independent, factors influencing segregation in western grey kangaroos.


20.00% 20.00%



The objective of our present paper is to derive a computationally efficient genetic pattern learning algorithm to evolutionarily derive the optimal rebalancing weights (i.e. dynamic hedge ratios) to engineer a structured financial product out of a multiasset, best-of option. The stochastic target function is formulated as an expected squared cost of hedging (tracking) error which is assumed to be partly dependent on the governing Markovian process underlying the individual asset returns and partly on
randomness i.e. pure white noise. A simple haploid genetic algorithm is advanced as an alternative numerical scheme, which is deemed to be
computationally more efficient than numerically deriving an explicit solution to the formulated optimization model. An extension to our proposed scheme is suggested by means of adapting the Genetic Algorithm parameters based on fuzzy logic controllers.


20.00% 20.00%



The future global distribution of the political regimes of countries, just like that of their economic incomes, displays a surprising tendency for polarization into only two clubs of convergence at the extrema. This, in itself, is a persuasive reason to analyze afresh the logical validity of an endogenous theory for political and economic development inherent in modernization theory. I suggest how adopting a simple evolutionary game theoretic view on the subject allows an explanation for these parallel clubs of convergence in political regimes and economic income within the framework of existing research in democratization theory. I also suggest how instrumental action can be methodically introduced into such a setup using learning strategies adopted by political actors. These strategies, based on the first principles of political competition, are motivated by introducing the theoretical concept of a Credible Polity.