41 resultados para Vector Space IR, Search Engines, Document Clustering, Document
Resumo:
Social signals and interpretation of carried information is of high importance in Human Computer Interaction. Often used for affect recognition, the cues within these signals are displayed in various modalities. Fusion of multi-modal signals is a natural and interesting way to improve automatic classification of emotions transported in social signals. Throughout most present studies, uni-modal affect recognition as well as multi-modal fusion, decisions are forced for fixed annotation segments across all modalities. In this paper, we investigate the less prevalent approach of event driven fusion, which indirectly accumulates asynchronous events in all modalities for final predictions. We present a fusion approach, handling short-timed events in a vector space, which is of special interest for real-time applications. We compare results of segmentation based uni-modal classification and fusion schemes to the event driven fusion approach. The evaluation is carried out via detection of enjoyment-episodes within the audiovisual Belfast Story-Telling Corpus.
Resumo:
Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.
Resumo:
Let A be a unital dense algebra of linear mappings on a complex vector space X. Let φ = Σn i=1 Mai,bi be a locally quasi-nilpotent elementary operator of length n on A. We show that, if {a1, . . . , an} is locally linearly independent, then the local dimension of V (φ) = span{biaj : 1 ≤ i, j ≤ n} is at most n(n−1) 2 . If ldim V (φ) = n(n−1) 2 , then there exists a representation of φ as φ = Σn i=1 Mui,vi with viuj = 0 for i ≥ j. Moreover, we give a complete characterization of locally quasinilpotent elementary operators of length 3.
Resumo:
We say that a (countably dimensional) topological vector space X is orbital if there is T∈L(X) and a vector x∈X such that X is the linear span of the orbit {Tnx:n=0,1,…}. We say that X is strongly orbital if, additionally, x can be chosen to be a hypercyclic vector for T. Of course, X can be orbital only if the algebraic dimension of X is finite or infinite countable. We characterize orbital and strongly orbital metrizable locally convex spaces. We also show that every countably dimensional metrizable locally convex space X does not have the invariant subset property. That is, there is T∈L(X) such that every non-zero x∈X is a hypercyclic vector for T. Finally, assuming the Continuum Hypothesis, we construct a complete strongly orbital locally convex space.
As a byproduct of our constructions, we determine the number of isomorphism classes in the set of dense countably dimensional subspaces of any given separable infinite dimensional Fréchet space X. For instance, in X=ℓ2×ω, there are exactly 3 pairwise non-isomorphic (as topological vector spaces) dense countably dimensional subspaces.
Resumo:
Resource Selection (or Query Routing) is an important step in P2P IR. Though analogous to document retrieval in the sense of choosing a relevant subset of resources, resource selection methods have evolved independently from those for document retrieval. Among the reasons for such divergence is that document retrieval targets scenarios where underlying resources are semantically homogeneous, whereas peers would manage diverse content. We observe that semantic heterogeneity is mitigated in the clustered 2-tier P2P IR architecture resource selection layer by way of usage of clustering, and posit that this necessitates a re-look at the applicability of document retrieval methods for resource selection within such a framework. This paper empirically benchmarks document retrieval models against the state-of-the-art resource selection models for the problem of resource selection in the clustered P2P IR architecture, using classical IR evaluation metrics. Our benchmarking study illustrates that document retrieval models significantly outperform other methods for the task of resource selection in the clustered P2P IR architecture. This indicates that clustered P2P IR framework can exploit advancements in document retrieval methods to deliver corresponding improvements in resource selection, indicating potential convergence of these fields for the clustered P2P IR architecture.
Resumo:
Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop two clustering algorithms toward the outlined goal of building interpretable and reconfigurable cluster models. They generate clusters with associated rules that are composed of conditions on word occurrences or nonoccurrences. The proposed approaches vary in the complexity of the format of the rules; RGC employs disjunctions and conjunctions in rule generation whereas RGC-D rules are simple disjunctions of conditions signifying presence of various words. In both the cases, each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. Rules of the latter kind are easy to interpret, whereas the former leads to more accurate clustering. We show that our approaches outperform the unsupervised decision tree approach for rule-generating clustering and also an approach we provide for generating interpretable models for general clusterings, both by significant margins. We empirically show that the purity and f-measure losses to achieve interpretability can be as little as 3 and 5%, respectively using the algorithms presented herein.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
Latent semantic indexing (LSI) is a technique used for intelligent information retrieval (IR). It can be used as an alternative to traditional keyword matching IR and is attractive in this respect because of its ability to overcome problems with synonymy and polysemy. This study investigates various aspects of LSI: the effect of the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the key stage of the LSI process; and the effect of different threshold types in the HWT on the search results. The developed method allows the visualisation and processing of the term document matrix, generated in the LSI process, using HWT. The results have shown that precision can be increased by applying the HWT as a preprocessing step, with better results for hard thresholding than soft thresholding, whereas standard SVD-based LSI remains the most effective way of searching in terms of recall value.
Resumo:
This paper considers the musicological aspects of the songs performed by Ophelia in Shakespeare's Hamlet. It proposes a reconsideration of the concept of madness and insanity by an attentive, attuned and learned listening to the songs sung by Ophelia and the ways in which they are performed and received.