6 resultados para textual data

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date tweets such that their locations are close to a user specified location and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “food poisoning vomiting.” We consider the Temporal Spatial-Keyword Top-k Subscription (TaSK) query. Given a TaSK query, we continuously maintain up-to-date top-k most relevant results over a stream of geo-textual objects (e.g., geo-tagged Tweets) for the query. The TaSK query takes into account text relevance, spatial proximity, and recency of geo-textual objects in evaluating its relevance with a geo-textual object. We propose a novel solution to efficiently process a large number of TaSK queries over a stream of geotextual objects. We evaluate the efficiency of our approach on two real-world datasets and the experimental results show that our solution is able to achieve a reduction of the processing time by 70-80% compared with two baselines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The past decade had witnessed an unprecedented growth in the amount of available digital content, and its volume is expected to continue to grow the next few years. Unstructured text data generated from web and enterprise sources form a large fraction of such content. Many of these contain large volumes of reusable data such as solutions to frequently occurring problems, and general know-how that may be reused in appropriate contexts. In this work, we address issues around leveraging unstructured text data from sources as diverse as the web and the enterprise within the Case-based Reasoning framework. Case-based Reasoning (CBR) provides a framework and methodology for systematic reuse of historical knowledge that is available in the form of problemsolution
pairs, in solving new problems. Here, we consider possibilities of enhancing Textual CBR systems under three main themes: procurement, maintenance and retrieval. We adapt and build upon the stateof-the-art techniques from data mining and natural language processing in addressing various challenges therein. Under procurement, we investigate the problem of extracting cases (i.e., problem-solution pairs) from data sources such as incident/experience
reports. We develop case-base maintenance methods specifically tuned to text targeted towards retaining solutions such that the utility of the filtered case base in solving new problems is maximized. Further, we address the problem of query suggestions for textual case-bases and show that exploiting the problem-solution partition can enhance retrieval effectiveness by prioritizing more useful query suggestions. Additionally, we illustrate interpretable clustering as a tool to drill-down to domain specific text collections (since CBR systems are usually very domain specific) and develop techniques for improved similarity assessment in social media sources such as microblogs. Through extensive empirical evaluations, we illustrate the improvements that we are able to
achieve over the state-of-the-art methods for the respective tasks.