867 resultados para Web data


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This talk will present an overview of the ongoing ERCIM project SMARTDOCS (SeMAntically-cReaTed DOCuments) which aims at automatically generating webpages from RDF data. It will particularly focus on the current issues and the investigated solutions in the different modules of the project, which are related to document planning, natural language generation and multimedia perspectives. The second part of the talk will be dedicated to the KODA annotation system, which is a knowledge-base-agnostic annotator designed to provide the RDF annotations required in the document generation process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

RDFa JSON-LD Microdata

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our proposal aims to display the analysis techniques, methodologies as well as the most relevant results expected within the Exhibitium project framework (http://www.exhibitium.com). Awarded by the BBVA Foundation, the Exhibitium project is being developed by an international consortium of several research groups . Its main purpose is to build a comprehensive and structured data repository about temporary art exhibitions, captured from the web, to make them useful and reusable in various domains through open and interoperable data systems.