41 resultados para Vector Space IR, Search Engines, Document Clustering, Document


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We prove that for any Hausdorff topological vector space E over the field R there exists A subset of E such that E is homeomorphic to a subset of A x R and A x R is homeomorphic to a subset of E. Using this fact we prove that E is monotonically normal if and only if E is stratifiable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of bit-level systolic arrays in the design of a vector quantized transformed subband coding system for speech signals is described. It is shown how the major components of this system can be decomposed into a small number of highly regular building blocks that interface directly to one another. These include circuits for the computation of the discrete cosine transform, the inverse discrete cosine transform, and vector quantization codebook search.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A bit-level systolic array system for performing a binary tree Vector Quantization codebook search is described. This consists of a linear chain of regular VLSI building blocks and exhibits data rates suitable for a wide range of real-time applications. A technique is described which reduces the computation required at each node in the binary tree to that of a single inner product operation. This method applies to all the common distortion measures (including the Euclidean distance, the Weighted Euclidean distance and the Itakura-Saito distortion measure) and significantly reduces the hardware required to implement the tree search system. © 1990 Kluwer Academic Publishers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vector Space Models (VSMs) of Semantics are useful tools for exploring the semantics of single words, and the composition of words to make phrasal meaning. While many methods can estimate the meaning (i.e. vector) of a phrase, few do so in an interpretable way. We introduce a new method (CNNSE) that allows word and phrase vectors to adapt to the notion of composition. Our method learns a VSM that is both tailored to support a chosen semantic composition operation, and whose resulting features have an intuitive interpretation. Interpretability allows for the exploration of phrasal semantics, which we leverage to analyze performance on a behavioral task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Textual problem-solution repositories are available today in
various forms, most commonly as problem-solution pairs from community
question answering systems. Modern search engines that operate on
the web can suggest possible completions in real-time for users as they
type in queries. We study the problem of generating intelligent query
suggestions for users of customized search systems that enable querying
over problem-solution repositories. Due to the small scale and specialized
nature of such systems, we often do not have the luxury of depending on
query logs for finding query suggestions. We propose a retrieval model
for generating query suggestions for search on a set of problem solution
pairs. We harness the problem solution partition inherent in such
repositories to improve upon traditional query suggestion mechanisms
designed for systems that search over general textual corpora. We evaluate
our technique over real problem-solution datasets and illustrate that
our technique provides large and statistically significant

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that the user can specialize the query to better suit her intent, even before perusing search results. We propose a method, Select-Link-Rank, that exploits semantic information from Wikipedia to generate diversified query expansions. SLR does collective processing of terms and Wikipedia entities in an integrated framework, simultaneously diversifying query expansions and entity recommendations. SLR starts with selecting informative terms from search results of the initial query, links them to Wikipedia entities, performs a diversity-conscious entity scoring and transfers such scoring to the term space to arrive at query expansion suggestions. Through an extensive empirical analysis and user study, we show that our method outperforms the state-of-the-art diversified query expansion and diversified entity recommendation techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information retrieval in the age of Internet search engines has become part of ordinary discourse and everyday practice: "Google" is a verb in common usage. Thus far, more attention has been given to practical understanding of information retrieval than to a full theoretical account. In Human Information Retrieval, Julian Warner offers a comprehensive overview of information retrieval, synthesizing theories from different disciplines (information and computer science, librarianship and indexing, and information society discourse) and incorporating such disparate systems as WorldCat and Google into a single, robust theoretical framework. There is a need for such a theoretical treatment, he argues, one that reveals the structure and underlying patterns of this complex field while remaining congruent with everyday practice. Warner presents a labor theoretic approach to information retrieval, building on his previously formulated distinction between semantic and syntactic mental labor, arguing that the description and search labor of information retrieval can be understood as both semantic and syntactic in character. Warner's information science approach is rooted in the humanities and the social sciences but informed by an understanding of information technology and information theory. The chapters offer a progressive exposition of the topic, with illustrative examples to explain the concepts presented. Neither narrowly practical nor largely speculative, Human Information Retrieval meets the contemporary need for a broader treatment of information and information systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We say that the Peano theorem holds for a topological vector space $E$ if, for any continuous mapping $f : {\Bbb R}\times E \to E$ and any $(t(0), x(0))$ is an element of ${\Bbb R}\times E$, the Cauchy problem $\dot x(t) = f(t,x(t))$, $x(t(0)) = x(0)$, has a solution in some neighborhood of $t(0)$. We say that the weak version of Peano theorem holds for $E$ if, for any continuous map $f : {\Bbb R}\times E \to E$, the equation $\dot x(t) = f (t, x(t))$ has a solution on some interval. We construct an example (answering a question posed by S. G. Lobanov) of a Hausdorff locally convex topological vector space E for which the weak version of Peano theorem holds and the Peano theorem fails to hold. We also construct a Hausdorff locally convex topological vector space E for which the Peano theorem holds and any barrel in E is neither compact nor sequentially compact.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We construct a countable-dimensional Hausdorff locally convex topological vector space $E$ and a stratifiable closed linear subspace $F$ subset of $E$ such that any linear extension operator from $C_b(F)$ to $C_b(E)$ is unbounded (here $C_b(X)$ stands for the Banach space of continuous bounded real-valued functions on $X$).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We prove that a continuous linear operator T on a topological vector space X with weak topology is mixing if and only if the dual operator T' has no finite dimensional invariant subspaces. This result implies the characterization of hypercyclic operators on the space $\omega$ due to Herzog and Lemmert and implies the result of Bayart and Matheron, who proved that for any hypercyclic operator T on $\omega$, $T\oplus T$ is also hypercyclic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A tuple $(T_1,\dots,T_n)$ of continuous linear operators on a topological vector space $X$ is called hypercyclic if there is $x\in X$ such that the the orbit of $x$ under the action of the semigroup generated by $T_1,\dots,T_n$ is dense in $X$. This concept was introduced by N.~Feldman, who have raised 7 questions on hypercyclic tuples. We answer those 4 of them, which can be dealt with on the level of operators on finite dimensional spaces. In
particular, we prove that the minimal cardinality of a hypercyclic tuple of operators on $\C^n$ (respectively, on $\R^n$) is $n+1$ (respectively, $\frac n2+\frac{5+(-1)^n}{4}$), that there are non-diagonalizable tuples of operators on $\R^2$ which possess an orbit being neither dense nor nowhere dense and construct a hypercyclic 6-tuple of operators on $\C^3$ such that every operator commuting with each member of the tuple is non-cyclic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.