877 resultados para document clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Resource Selection (or Query Routing) is an important step in P2P IR. Though analogous to document retrieval in the sense of choosing a relevant subset of resources, resource selection methods have evolved independently from those for document retrieval. Among the reasons for such divergence is that document retrieval targets scenarios where underlying resources are semantically homogeneous, whereas peers would manage diverse content. We observe that semantic heterogeneity is mitigated in the clustered 2-tier P2P IR architecture resource selection layer by way of usage of clustering, and posit that this necessitates a re-look at the applicability of document retrieval methods for resource selection within such a framework. This paper empirically benchmarks document retrieval models against the state-of-the-art resource selection models for the problem of resource selection in the clustered P2P IR architecture, using classical IR evaluation metrics. Our benchmarking study illustrates that document retrieval models significantly outperform other methods for the task of resource selection in the clustered P2P IR architecture. This indicates that clustered P2P IR framework can exploit advancements in document retrieval methods to deliver corresponding improvements in resource selection, indicating potential convergence of these fields for the clustered P2P IR architecture.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The high-affinity receptors for human granulocyte-macrophage colony-stimulating factor (GM-CSF), interleukin-1 (IL-3), and IL-5 are heterodimeric complexes consisting of cytokine-specific alpha subunits and a common signal-transducing beta subunit (h beta c). We have previously demonstrated the oncogenic potential of this group of receptors by identifying constitutively activating point mutations in the extracellular and transmembrane domains of h beta c. We report here a comprehensive screen of the entire h beta c molecule that has led to the identification of additional constitutive point mutations by virtue of their ability to confer factor independence on murine FDC-P1 cells. These mutations were clustered exclusively in a central region of h beta c that encompasses the extracellular membrane-proximal domain, transmembrane domain, and membrane-proximal region of the cytoplasmic domain. Interestingly, most h beta c mutants exhibited cell type-specific constitutive activity, with only two transmembrane domain mutants able to confer factor independence on both murine FDC-P1 and BAF-B03 cells. Examination of the biochemical properties of these mutants in FDC-P1 cells indicated that MAP kinase (ERK1/2), STAT, and JAK2 signaling molecules were constitutively activated. In contrast, only some of the mutant beta subunits were constitutively tyrosine phosphorylated. Taken together; these results highlight key regions involved in h beta c activation, dissociate h beta c tyrosine phosphorylation from MAP kinase and STAT activation, and suggest the involvement of distinct mechanisms by which proliferative signals can be generated by h beta c. (C) 1998 by The American Society of Hematology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Application of geographic information system (GIS) and global positioning system (GPS) technology in the Hlabisa community-based tuberculosis treatment programme documents the increase in accessibility to treatment after the expansion of the service from health facilities to include community workers and volunteers.