891 resultados para Semantic kernel


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Natural history collections are an invaluable resource housing a wealth of knowledge with a long tradition of contributing to a wide range of fields such as taxonomy, quarantine, conservation and climate change. It is recognized however [Smith and Blagoderov 2012] that such physical collections are often heavily underutilized as a result of the practical issues of accessibility. The digitization of these collections is a step towards removing these access issues, but other hurdles must be addressed before we truly unlock the potential of this knowledge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper shows that by using only symbolic language phrases, a mobile robot can purposefully navigate to specified rooms in previously unexplored environments. The robot intelligently organises a symbolic language description of the unseen environment and “imagines” a representative map, called the abstract map. The abstract map is an internal representation of the topological structure and spatial layout of symbolically defined locations. To perform goal-directed exploration, the abstract map creates a high-level semantic plan to reason about spaces beyond the robot’s known world. While completing the plan, the robot uses the metric guidance provided by a spatial layout, and grounded observations of door labels, to efficiently guide its navigation. The system is shown to complete exploration in unexplored spaces by travelling only 13.3% further than the optimal path.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying unusual or anomalous patterns in an underlying dataset is an important but challenging task in many applications. The focus of the unsupervised anomaly detection literature has mostly been on vectorised data. However, many applications are more naturally described using higher-order tensor representations. Approaches that vectorise tensorial data can destroy the structural information encoded in the high-dimensional space, and lead to the problem of the curse of dimensionality. In this paper we present the first unsupervised tensorial anomaly detection method, along with a randomised version of our method. Our anomaly detection method, the One-class Support Tensor Machine (1STM), is a generalisation of conventional one-class Support Vector Machines to higher-order spaces. 1STM preserves the multiway structure of tensor data, while achieving significant improvement in accuracy and efficiency over conventional vectorised methods. We then leverage the theory of nonlinear random projections to propose the Randomised 1STM (R1STM). Our empirical analysis on several real and synthetic datasets shows that our R1STM algorithm delivers comparable or better accuracy to a state-of-the-art deep learning method and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study diagonal estimates for the Bergman kernels of certain model domains in C-2 near boundary points that are of infinite type. To do so, we need a mild structural condition on the defining functions of interest that facilitates optimal upper and lower bounds. This is a mild condition; unlike earlier studies of this sort, we are able to make estimates for non-convex pseudoconvex domains as well. Thisn condition quantifies, in some sense, how flat a domain is at an infinite-type boundary point. In this scheme of quantification, the model domains considered below range-roughly speaking-from being mildly infinite-type'' to very flat at the infinite-type points.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual tracking has been a challenging problem in computer vision over the decades. The applications of Visual Tracking are far-reaching, ranging from surveillance and monitoring to smart rooms. Mean-shift (MS) tracker, which gained more attention recently, is known for tracking objects in a cluttered environment and its low computational complexity. The major problem encountered in histogram-based MS is its inability to track rapidly moving objects. In order to track fast moving objects, we propose a new robust mean-shift tracker that uses both spatial similarity measure and color histogram-based similarity measure. The inability of MS tracker to handle large displacements is circumvented by the spatial similarity-based tracking module, which lacks robustness to object's appearance change. The performance of the proposed tracker is better than the individual trackers for tracking fast-moving objects with better accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mobile applications are being increasingly deployed on a massive scale in various mobile sensor grid database systems. With limited resources from the mobile devices, how to process the huge number of queries from mobile users with distributed sensor grid databases becomes a critical problem for such mobile systems. While the fundamental semantic cache technique has been investigated for query optimization in sensor grid database systems, the problem is still difficult due to the fact that more realistic multi-dimensional constraints have not been considered in existing methods. To solve the problem, a new semantic cache scheme is presented in this paper for location-dependent data queries in distributed sensor grid database systems. It considers multi-dimensional constraints or factors in a unified cost model architecture, determines the parameters of the cost model in the scheme by using the concept of Nash equilibrium from game theory, and makes semantic cache decisions from the established cost model. The scenarios of three factors of semantic, time and locations are investigated as special cases, which improve existing methods. Experiments are conducted to demonstrate the semantic cache scheme presented in this paper for distributed sensor grid database systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this note, we point out that a large family of n x n matrix valued kernel functions defined on the unit disc D subset of C, which were constructed recently in [9], behave like the familiar Bergman kernel function on ID in several different ways. We show that a number of questions involving the multiplication operator on the corresponding Hilbert space of holomorphic functions on D can be answered using this likeness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot with-out environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot’s behaviour during navigation tasks. The system is made available to the community as a ROS module.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper gives a new iterative algorithm for kernel logistic regression. It is based on the solution of a dual problem using ideas similar to those of the Sequential Minimal Optimization algorithm for Support Vector Machines. Asymptotic convergence of the algorithm is proved. Computational experiments show that the algorithm is robust and fast. The algorithmic ideas can also be used to give a fast dual algorithm for solving the optimization problem arising in the inner loop of Gaussian Process classifiers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fluctuation of the distance between a fluorescein-tyrosine pair within a single protein complex was directly monitored in real time by photoinduced electron transfer and found to be a stationary, time-reversible, and non-Markovian Gaussian process. Within the generalized Langevin equation formalism, we experimentally determine the memory kernel K(t), which is proportional to the autocorrelation function of the random fluctuating force. K(t) is a power-law decay, t(-0.51 +/- 0.07) in a broad range of time scales (10(-3)-10 s). Such a long-time memory effect could have implications for protein functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent evidence from adult pronoun comprehension suggests that semantic factors such as verb transitivity affect referent salience and thereby anap- hora resolution. We tested whether the same semantic factors influence pronoun comprehension in young children. In a visual world study, 3-year- olds heard stories that began with a sentence containing either a high or a low transitivity verb. Looking behaviour to pictures depicting the subject and object of this sentence was recorded as children listened to a subsequent sentence containing a pronoun. Children showed a stronger preference to look to the subject as opposed to the object antecedent in the low transitivity condition. In addition there were general preferences (1) to look to the subject in both conditions and (2) to look more at both potential antecedents in the high transitivity condition. This suggests that children, like adults, are affected by semantic factors, specifically semantic prominence, when interpreting anaphoric pronouns.