998 resultados para Evolutionary clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters. Traditional clustering methods fail to scale when they need to generate a large number of clusters. Furthermore, when the clusters size in the solution is heterogeneous, i.e. some of the clusters are large in size, the similarity measures tend to degrade. A ranking based clustering method is proposed to deal with these issues in the context of the Social Event Detection task. Ranking scores are used to select a small number of most relevant clusters in order to compare and place a document. Additionally,instead of conventional cluster centroids, cluster patches are proposed to represent clusters, that are hubs-like set of documents. Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity. Results show that these strategies allow us to have a balance between performance and accuracy of the clustering solution gained by the clustering method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many insect clades, especially within the Diptera (true flies), have been considered classically ‘Gondwanan’, with an inference that distributions derive from vicariance of the southern continents. Assessing the role that vicariance has played in the evolution of austral taxa requires testing the location and tempo of diversification and speciation against the well-established predictions of fragmentation of the ancient super-continent. Several early (anecdotal) hypotheses that current austral distributions originate from the breakup of Gondwana derive from studies of taxa within the family Chironomidae (non-biting midges). With the advent of molecular phylogenetics and biogeographic analytical software, these studies have been revisited and expanded to test such conclusions better. Here we studied the midge genus Stictocladius Edwards, from the subfamily Orthocladiinae, which contains austral-distributed clades that match vicariance-based expectations. We resolve several issues of systematic relationships among morphological species and reveal cryptic diversity within many taxa. Time-calibrated phylogenetic relationships among taxa accorded partially with the predicted tempo from geology. For these apparently vagile insects, vicariance-dated patterns persist for South America and Australia. However, as often found, divergence time estimates for New Zealand at c. 50 mya post-date separation of Zealandia from Antarctica and the remainder of Gondwana, but predate the proposed Oligocene ‘drowning’ of these islands. We detail other such ‘anomalous’ dates and suggest a single common explanation rather than stochastic processes. This could involve synchronous establishment following recovery from ‘drowning’ and/or deleteriously warming associated with the mid-Eocene climatic optimum (hence ‘waving’, which refers to cycles of drowning events) plus new availability of topography providing of cool running waters, or all these factors in combination. Alternatively a vicariance explanation remains available, given the uncertain duration of connectivity of Zealandia to Australia–Antarctic–South America via the Lord Howe and Norfolk ridges into the Eocene.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Evolutionary algorithms are playing an increasingly important role as search methods in cognitive science domains. In this study, methodological issues in the use of evolutionary algorithms were investigated via simulations in which procedures were systematically varied to modify the selection pressures on populations of evolving agents. Traditional roulette wheel, tournament, and variations of these selection algorithms were compared on the “needle-in-a-haystack” problem developed by Hinton and Nowlan in their 1987 study of the Baldwin effect. The task is an important one for cognitive science, as it demonstrates the power of learning as a local search technique in smoothing a fitness landscape that lacks gradient information. One aspect that has continued to foster interest in the problem is the observation of residual learning ability in simulated populations even after long periods of time. Effective evolutionary algorithms balance their search effort between broad exploration of the search space and in-depth exploitation of promising solutions already found. Issues discussed include the differential effects of rank and proportional selection, the tradeoff between migration of populations towards good solutions and maintenance of diversity, and the development of measures that illustrate how each selection algorithm affects the search process over generations. We show that both roulette wheel and tournament algorithms can be modified to appropriately balance search between exploration and exploitation, and effectively eliminate residual learning in this problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A multi-objective design optimization study has been conducted for upstream fuel injection through porous media applied to the first ramp of a two-dimensional scramjet intake. The optimization has been performed by coupling evolutionary algorithms assisted by surrogate modeling and computational fluid dynamics with respect to three design criteria, that is, the maximization of the absolute mixing quantity, total pressure saving, and fuel penetration. A distinct Pareto optimal front has been obtained, highlighting the counteracting behavior of the total pressure against the mixing efficiency and fuel penetration. The injector location and size have been identified as the key design parameters as a result of a sensitivity analysis, with negligible influence of the porous properties in the configurations and conditions considered in the present study. Flowfield visualization has revealed the underlying physics associated with the effects of these dominant parameters on the shock structure and intensity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

China has experienced considerable economic growth since 1978, which was accompanied by unprecedented growth in urbanization and, more recently, by associated rising urban housing and land banking issues. One such issue is that of land hoarding - where real estate developers purchase land to hold unused in the rising market for a future lucrative sale, often several years later. This practice is outlawed in China, where land use is controlled by increasingly strengthened Government policies and inspectors. Despite this, land hoarding continues apace, with the main culprits being the developers and inspectors working subversively. This resembles a game between two players - the inspector and the developer - which provides the setting for this paper in developing an evolutionary game theory model to provide insights into dealing with the dilemmas faced by the players. The logic and dilemma of land banking strategy and illegal land banking issues are analysed, along with the land inspector’s role from a game theory perspective by determining the replication dynamic mechanism and evolutionary stable strategies under the various conditions that the players face. The major factors influencing the actions of land inspectors, on the other hand, are the costs of inspection, no matter if it is strict or indolent, conflict costs, and income and penalties from corruption. From this, it is shown that, when the net loss for corruption (income from corruption minus the penalties for corruption and cost of strict inspections) is less than the cost of strict inspections, the final evolutionary stable strategy of the inspectors is to carry out indolent inspections. Then, whether penalising developers for hoarding is severe or not, the evolutionary strategy for the developer is to hoard. The implications for land use control mechanisms and associated developer-inspector actions and counteractions are then examined in the light of the model's properties.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The "Humies" awards are an annual competition held in conjunction with the Genetic and Evolutionary Computation Conference (GECCO), in which cash prizes totalling $10,000 are awarded to the most human-competitive results produced by any form of evolutionary computation published in the previous year. This article describes the gold medal-winning entry from the 2012 "Humies" competition, based on the LUDI system for playing, evaluating and creating new board games. LUDI was able to demonstrate human-competitive results in evolving novel board games that have gone on to be commercially published, one of which, Yavalath, has been ranked in the top 2.5% of abstract board games ever invented. Further evidence of human-competitiveness was demonstrated in the evolved games implicitly capturing several principles of good game design, outperforming human designers in at least one case, and going on to inspire a new sub-genre of games.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many complex aeronautical design problems can be formulated with efficient multi-objective evolutionary optimization methods and game strategies. This book describes the role of advanced innovative evolution tools in the solution, or the set of solutions of single or multi disciplinary optimization. These tools use the concept of multi-population, asynchronous parallelization and hierarchical topology which allows different models including precise, intermediate and approximate models with each node belonging to the different hierarchical layer handled by a different Evolutionary Algorithm. The efficiency of evolutionary algorithms for both single and multi-objective optimization problems are significantly improved by the coupling of EAs with games and in particular by a new dynamic methodology named “Hybridized Nash-Pareto games”. Multi objective Optimization techniques and robust design problems taking into account uncertainties are introduced and explained in detail. Several applications dealing with civil aircraft and UAV, UCAV systems are implemented numerically and discussed. Applications of increasing optimization complexity are presented as well as two hands-on test cases problems. These examples focus on aeronautical applications and will be useful to the practitioner in the laboratory or in industrial design environments. The evolutionary methods coupled with games presented in this volume can be applied to other areas including surface and marine transport, structures, biomedical engineering, renewable energy and environmental problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.