792 resultados para web data feeds


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This talk will present an overview of the ongoing ERCIM project SMARTDOCS (SeMAntically-cReaTed DOCuments) which aims at automatically generating webpages from RDF data. It will particularly focus on the current issues and the investigated solutions in the different modules of the project, which are related to document planning, natural language generation and multimedia perspectives. The second part of the talk will be dedicated to the KODA annotation system, which is a knowledge-base-agnostic annotator designed to provide the RDF annotations required in the document generation process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

RDFa JSON-LD Microdata

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our proposal aims to display the analysis techniques, methodologies as well as the most relevant results expected within the Exhibitium project framework (http://www.exhibitium.com). Awarded by the BBVA Foundation, the Exhibitium project is being developed by an international consortium of several research groups . Its main purpose is to build a comprehensive and structured data repository about temporary art exhibitions, captured from the web, to make them useful and reusable in various domains through open and interoperable data systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: Abundant evidence shows that regular physical activity (PA) is an effective strategy for preventing obesity in people of diverse socioeconomic status (SES) and racial groups. The proportion of PA performed in parks and how this differs by proximate neighborhood SES has not been thoroughly investigated. The present project analyzes online public web data feeds to assess differences in outdoor PA by neighborhood SES in St. Louis, MO, USA.
Methods: First, running and walking routes submitted by users of the website MapMyRun.com were downloaded. The website enables participants to plan, map, record, and share their exercise routes and outdoor activities like runs, walks, and hikes in an online database. Next, the routes were visually illustrated using geographic information systems. Thereafter, using park data and 2010 Missouri census poverty data, the odds of running and walking routes traversing a low-SES neighborhood, and traversing a park in a low-SES neighborhood were examined in comparison to the odds of routes traversing higher-SES neighborhoods and higher-SES parks.
Results: Results show that a majority of running and walking routes occur in or at least traverse through a park. However, this finding does not hold when comparing low-SES neighborhoods to higher-SES neighborhoods in St. Louis. The odds of running in a park in a low-SES neighborhood were 54% lower than running in a park in a higher-SES neighborhood (OR = 0.46, CI = 0.17-1.23). The odds of walking in a park in a low-SES neighborhood were 17% lower than walking in a park in a higher-SES neighborhood (OR = 0.83, CI = 0.26-2.61).
Conclusion: The novel methods of this study include the use of inexpensive, unobtrusive, and publicly available web data feeds to examine PA in parks and differences by neighborhood SES. Emerging technologies like MapMyRun.com present significant advantages to enhance tracking of user-defined PA across large geographic and temporal settings.