21 resultados para Document technologique


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a new spectral clustering method called correlation preserving indexing (CPI), which is performed in the correlation similarity measure space. In this framework, the documents are projected into a low-dimensional semantic space in which the correlations between the documents in the local patches are maximized while the correlations between the documents outside these patches are minimized simultaneously. Since the intrinsic geometrical structure of the document space is often embedded in the similarities between the documents, correlation as a similarity measure is more suitable for detecting the intrinsic geometrical structure of the document space than euclidean distance. Consequently, the proposed CPI method can effectively discover the intrinsic structures embedded in high-dimensional document space. The effectiveness of the new method is demonstrated by extensive experiments conducted on various data sets and by comparison with existing document clustering methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved evolving model, i.e., Evolving Tree (ETree) with Fuzzy c-Means (FCM), is proposed for undertaking text document visualization problems in this study. ETree forms a hierarchical tree structure in which nodes (i.e., trunks) are allowed to grow and split into child nodes (i.e., leaves), and each node represents a cluster of documents. However, ETree adopts a relatively simple approach to split its nodes. Thus, FCM is adopted as an alternative to perform node splitting in ETree. An experimental study using articles from a flagship conference of Universiti Malaysia Sarawak (UNIMAS), i.e., Engineering Conference (ENCON), is conducted. The experimental results are analyzed and discussed, and the outcome shows that the proposed ETree-FCM model is effective for undertaking text document clustering and visualization problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role of nurses in cardiothoracic transplantation has evolved over the last 25 years. Transplant nurses work in a variety of roles in collaboration with multidisciplinary teams to manage complex pre- and post-transplantation issues. There is lack of clarity and consistency regarding required qualifications to practice transplant nursing, delineation of roles and adequate levels of staffing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) –where the underlying graphical model is an undirected bipartite graph. Inference is efficient document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.