903 resultados para Natural language processing (Computer science)
Resumo:
A primary goal of context-aware systems is delivering the right information at the right place and right time to users in order to enable them to make effective decisions and improve their quality of life. There are three key requirements for achieving this goal: determining what information is relevant, personalizing it based on the users’ context (location, preferences, behavioral history etc.), and delivering it to them in a timely manner without an explicit request from them. These requirements create a paradigm that we term as “Proactive Context-aware Computing”. Most of the existing context-aware systems fulfill only a subset of these requirements. Many of these systems focus only on personalization of the requested information based on users’ current context. Moreover, they are often designed for specific domains. In addition, most of the existing systems are reactive - the users request for some information and the system delivers it to them. These systems are not proactive i.e. they cannot anticipate users’ intent and behavior and act proactively without an explicit request from them. In order to overcome these limitations, we need to conduct a deeper analysis and enhance our understanding of context-aware systems that are generic, universal, proactive and applicable to a wide variety of domains. To support this dissertation, we explore several directions. Clearly the most significant sources of information about users today are smartphones. A large amount of users’ context can be acquired through them and they can be used as an effective means to deliver information to users. In addition, social media such as Facebook, Flickr and Foursquare provide a rich and powerful platform to mine users’ interests, preferences and behavioral history. We employ the ubiquity of smartphones and the wealth of information available from social media to address the challenge of building proactive context-aware systems. We have implemented and evaluated a few approaches, including some as part of the Rover framework, to achieve the paradigm of Proactive Context-aware Computing. Rover is a context-aware research platform which has been evolving for the last 6 years. Since location is one of the most important context for users, we have developed ‘Locus’, an indoor localization, tracking and navigation system for multi-story buildings. Other important dimensions of users’ context include the activities that they are engaged in. To this end, we have developed ‘SenseMe’, a system that leverages the smartphone and its multiple sensors in order to perform multidimensional context and activity recognition for users. As part of the ‘SenseMe’ project, we also conducted an exploratory study of privacy, trust, risks and other concerns of users with smart phone based personal sensing systems and applications. To determine what information would be relevant to users’ situations, we have developed ‘TellMe’ - a system that employs a new, flexible and scalable approach based on Natural Language Processing techniques to perform bootstrapped discovery and ranking of relevant information in context-aware systems. In order to personalize the relevant information, we have also developed an algorithm and system for mining a broad range of users’ preferences from their social network profiles and activities. For recommending new information to the users based on their past behavior and context history (such as visited locations, activities and time), we have developed a recommender system and approach for performing multi-dimensional collaborative recommendations using tensor factorization. For timely delivery of personalized and relevant information, it is essential to anticipate and predict users’ behavior. To this end, we have developed a unified infrastructure, within the Rover framework, and implemented several novel approaches and algorithms that employ various contextual features and state of the art machine learning techniques for building diverse behavioral models of users. Examples of generated models include classifying users’ semantic places and mobility states, predicting their availability for accepting calls on smartphones and inferring their device charging behavior. Finally, to enable proactivity in context-aware systems, we have also developed a planning framework based on HTN planning. Together, these works provide a major push in the direction of proactive context-aware computing.
Resumo:
This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.
Resumo:
Following an early claim by Nelson & McEvoy suggesting that word associations can display `spooky action at a distance behaviour', a serious investigation of the potentially quantum nature of such associations is currently underway. In this paper quantum theory is proposed as a framework suitable for modelling the mental lexicon, specifically the results obtained from both intralist and extralist word association experiments. Some initial models exploring this hypothesis are discussed, and they appear to be capable of substantial agreement with pre-existing experimental data. The paper concludes with a discussion of some experiments that will be performed in order to test these models.
Resumo:
In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.
Resumo:
We argue that web service discovery technology should help the user navigate a complex problem space by providing suggestions for services which they may not be able to formulate themselves as (s)he lacks the epistemic resources to do so. Free text documents in service environments provide an untapped source of information for augmenting the epistemic state of the user and hence their ability to search effectively for services. A quantitative approach to semantic knowledge representation is adopted in the form of semantic space models computed from these free text documents. Knowledge of the user’s agenda is promoted by associational inferences computed from the semantic space. The inferences are suggestive and aim to promote human abductive reasoning to guide the user from fuzzy search goals into a better understanding of the problem space surrounding the given agenda. Experimental results are discussed based on a complex and realistic planning activity.
Resumo:
Spoken term detection (STD) popularly involves performing word or sub-word level speech recognition and indexing the result. This work challenges the assumption that improved speech recognition accuracy implies better indexing for STD. Using an index derived from phone lattices, this paper examines the effect of language model selection on the relationship between phone recognition accuracy and STD accuracy. Results suggest that language models usually improve phone recognition accuracy but their inclusion does not always translate to improved STD accuracy. The findings suggest that using phone recognition accuracy to measure the quality of an STD index can be problematic, and highlight the need for an alternative that is more closely aligned with the goals of the specific detection task.
Resumo:
The World Wide Web has become a medium for people to share information. People use Web-based collaborative tools such as question answering (QA) portals, blogs/forums, email and instant messaging to acquire information and to form online-based communities. In an online QA portal, a user asks a question and other users can provide answers based on their knowledge, with the question usually being answered by many users. It can become overwhelming and/or time/resource consuming for a user to read all of the answers provided for a given question. Thus, there exists a need for a mechanism to rank the provided answers so users can focus on only reading good quality answers. The majority of online QA systems use user feedback to rank users’ answers and the user who asked the question can decide on the best answer. Other users who didn’t participate in answering the question can also vote to determine the best answer. However, ranking the best answer via this collaborative method is time consuming and requires an ongoing continuous involvement of users to provide the needed feedback. The objective of this research is to discover a way to recommend the best answer as part of a ranked list of answers for a posted question automatically, without the need for user feedback. The proposed approach combines both a non-content-based reputation method and a content-based method to solve the problem of recommending the best answer to the user who posted the question. The non-content method assigns a score to each user which reflects the users’ reputation level in using the QA portal system. Each user is assigned two types of non-content-based reputations cores: a local reputation score and a global reputation score. The local reputation score plays an important role in deciding the reputation level of a user for the category in which the question is asked. The global reputation score indicates the prestige of a user across all of the categories in the QA system. Due to the possibility of user cheating, such as awarding the best answer to a friend regardless of the answer quality, a content-based method for determining the quality of a given answer is proposed, alongside the non-content-based reputation method. Answers for a question from different users are compared with an ideal (or expert) answer using traditional Information Retrieval and Natural Language Processing techniques. Each answer provided for a question is assigned a content score according to how well it matched the ideal answer. To evaluate the performance of the proposed methods, each recommended best answer is compared with the best answer determined by one of the most popular link analysis methods, Hyperlink-Induced Topic Search (HITS). The proposed methods are able to yield high accuracy, as shown by correlation scores: Kendall correlation and Spearman correlation. The reputation method outperforms the HITS method in terms of recommending the best answer. The inclusion of the reputation score with the content score improves the overall performance, which is measured through the use of Top-n match scores.
Resumo:
Separability is a concept that is very difficult to define, and yet much of our scientific method is implicitly based upon the assumption that systems can sensibly be reduced to a set of interacting components. This paper examines the notion of separability in the creation of bi-ambiguous compounds that is based upon the CHSH and CH inequalities. It reports results of an experiment showing that violations of the CHSH and CH inequality can occur in human conceptual combination.
Resumo:
Measures and theories of information abound, but there are few formalised methods for treating the contextuality that can manifest in different information systems. Quantum theory provides one possible formalism for treating information in context. This paper introduces a quantum-like model of the human mental lexicon, and shows one set of recent experimental data suggesting that concept combinations can indeed behave non-separably. There is some reason to believe that the human mental lexicon displays entanglement.
Resumo:
In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.