875 resultados para document clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we introduce the Generalized Equality Classifier (GEC) for use as an unsupervised clustering algorithm in categorizing analog data. GEC is based on a formal definition of inexact equality originally developed for voting in fault tolerant software applications. GEC is defined using a metric space framework. The only parameter in GEC is a scalar threshold which defines the approximate equality of two patterns. Here, we compare the characteristics of GEC to the ART2-A algorithm (Carpenter, Grossberg, and Rosen, 1991). In particular, we show that GEC with the Hamming distance performs the same optimization as ART2. Moreover, GEC has lower computational requirements than AR12 on serial machines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A supersonic expansion containing acetylene seeded into Ar and produced from a circular nozzle is investigated using CW/cavity ring down spectroscopy, in the 1.5 μm range. The results, also involving experiments with pure acetylene and acetylene-He expansions, as well as slit nozzles, demonstrate that the denser central section in the expansion is slightly heated by the formation of acetylene aggregates, resulting into a dip in the monomer absorption line profiles. Acetylene-Ar aggregates are also formed at the edge of the circular nozzle expansion cone. © 2008 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The receptor deleted in colorectal cancer (DCC) directs dynamic polarizing activities in animals toward its extracellular ligand netrin. How DCC polarizes toward netrin is poorly understood. By performing live-cell imaging of the DCC orthologue UNC-40 during anchor cell invasion in Caenorhabditis elegans, we have found that UNC-40 clusters, recruits F-actin effectors, and generates F-actin in the absence of UNC-6 (netrin). Time-lapse analyses revealed that UNC-40 clusters assemble, disassemble, and reform at periodic intervals in different regions of the cell membrane. This oscillatory behavior indicates that UNC-40 clusters through a mechanism involving interlinked positive (formation) and negative (disassembly) feedback. We show that endogenous UNC-6 and ectopically provided UNC-6 orient and stabilize UNC-40 clustering. Furthermore, the UNC-40-binding protein MADD-2 (a TRIM family protein) promotes ligand-independent clustering and robust UNC-40 polarization toward UNC-6. Together, our data suggest that UNC-6 (netrin) directs polarized responses by stabilizing UNC-40 clustering. We propose that ligand-independent UNC-40 clustering provides a robust and adaptable mechanism to polarize toward netrin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents two multilevel refinement algorithms for the capacitated clustering problem. Multilevel refinement is a collaborative technique capable of significantly aiding the solution process for optimisation problems. The central methodologies of the technique are filtering solutions from the search space and reducing the level of problem detail to be considered at each level of the solution process. The first multilevel algorithm uses a simple tabu search while the other executes a standard local search procedure. Both algorithms demonstrate that the multilevel technique is capable of aiding the solution process for this combinatorial optimisation problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This deliverable outlines the design blueprints for the RAGE application scenario games and forms the rest of the scope for WP4’s tasks. The game designs have been developed in collaboration with application scenario partners in WP5, and informed by WP1, 2 & 3. Additionally peer-feedback has been provided by game developers across WP4. The designs outline the integration of the RAGE assets developed in WP2 and WP3. Each section provides in detail the game play descriptions, game dynamics and mechanics, pedagogies and technical implementation of the RAGE assets into the game applications as described in detailed in WP5’s application documents. The full description of the application objectives and associated learning outcomes has been provided in the project’s MS2 Application Scenario Outlines document.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this thesis was to determine whether the establishment and operation of an archives services by the Hudson's Bay Company had an effect on the company's ability to carry out document repairs. Data collection methods included reviews of published material, archival records of the Hudson's Bay Company, and semi-structured interviews. The study found that the Hudson's Bay Company's commitment to operating a modern archives service in accordance with accepted archive administration practices had a substantial effect on its ability to carry out document repairs. The principled approach to repair, as practiced by the Public Record Office, was a major influence. A review of secondary sources placed this development squarely within the context of archival developments in 20th century England. Overall, the thesis findings add to the growing conversation about conservation history in England, in particular, archive conservation history as it occurred outside of the Public Record Office in the 20th century, by discussing how some methods of repair that were devised, adopted and extended by the Public Record Office in the 19th and 20th centuries were adopted and applied in the 20th century by a well-established business corporation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.