8 resultados para Vector Space IR, Search Engines, Document Clustering, Document

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Models are becoming increasingly important in the software development process. As a consequence, the number of models being used is increasing, and so is the need for efficient mechanisms to search them. Various existing search engines could be used for this purpose, but they lack features to properly search models, mainly because they are strongly focused on text-based search. This paper presents Moogle, a model search engine that uses metamodeling information to create richer search indexes and to allow more complex queries to be performed. The paper also presents the results of an evaluation of Moogle, which showed that the metamodel information improves the accuracy of the search.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Let G = Z(pk) be a cyclic group of prime power order and let V and W be orthogonal representations of G with V-G = W-G = W-G = {0}. Let S(V) be the sphere of V and suppose f: S(V) -> W is a G-equivariant mapping. We give an estimate for the dimension of the set f(-1){0} in terms of V and W. This extends the Bourgin-Yang version of the Borsuk-Ulam theorem to this class of groups. Using this estimate, we also estimate the size of the G-coincidences set of a continuous map from S(V) into a real vector space W'.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bol algebras appear as the tangent algebra of Bol loops. A (left) Bol algebra is a vector space equipped with a binary operation [a, b] and a ternary operation {a, b, c} that satisfy five defining identities. If A is a left or right alternative algebra then A(b) is a Bol algebra, where [a, b] := ab - ba is the commutator and {a, b, c} := < b, c, a > is the Jordan associator. A special identity is an identity satisfied by Ab for all right alternative algebras A, but not satisfied by the free Bol algebra. We show that there are no special identities of degree <= 7, but there are special identities of degree 8. We obtain all the special identities of degree 8 in partition six-two. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A subspace representation of a poset S = {s(1), ..., S-t} is given by a system (V; V-1, ..., V-t) consisting of a vector space V and its sub-spaces V-i such that V-i subset of V-j if s(i) (sic) S-j. For each real-valued vector chi = (chi(1), ..., chi(t)) with positive components, we define a unitary chi-representation of S as a system (U: U-1, ..., U-t) that consists of a unitary space U and its subspaces U-i such that U-i subset of U-j if S-i (sic) S-j and satisfies chi 1 P-1 + ... + chi P-t(t) = 1, in which P-i is the orthogonal projection onto U-i. We prove that S has a finite number of unitarily nonequivalent indecomposable chi-representations for each weight chi if and only if S has a finite number of nonequivalent indecomposable subspace representations; that is, if and only if S contains any of Kleiner's critical posets. (c) 2012 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gelfand and Ponomarev [I.M. Gelfand, V.A. Ponomarev, Remarks on the classification of a pair of commuting linear transformations in a finite dimensional vector space, Funct. Anal. Appl. 3 (1969) 325-326] proved that the problem of classifying pairs of commuting linear operators contains the problem of classifying k-tuples of linear operators for any k. We prove an analogous statement for semilinear operators. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.