814 resultados para Semantic Web, Cineca,data warehouse, Università italiane
Resumo:
With long-term marine surveys and research, and especially with the development of new marine environment monitoring technologies, prodigious amounts of complex marine environmental data are generated, and continuously increase rapidly. Features of these data include massive volume, widespread distribution, multiple-sources, heterogeneous, multi-dimensional and dynamic in structure and time. The present study recommends an integrative visualization solution for these data, to enhance the visual display of data and data archives, and to develop a joint use of these data distributed among different organizations or communities. This study also analyses the web services technologies and defines the concept of the marine information gird, then focuses on the spatiotemporal visualization method and proposes a process-oriented spatiotemporal visualization method. We discuss how marine environmental data can be organized based on the spatiotemporal visualization method, and how organized data are represented for use with web services and stored in a reusable fashion. In addition, we provide an original visualization architecture that is integrative and based on the explored technologies. In the end, we propose a prototype system of marine environmental data of the South China Sea for visualizations of Argo floats, sea surface temperature fields, sea current fields, salinity, in-situ investigation data, and ocean stations. An integration visualization architecture is illustrated on the prototype system, which highlights the process-oriented temporal visualization method and demonstrates the benefit of the architecture and the methods described in this study.
Resumo:
Many Web applications walk the thin line between the need for dynamic data and the need to meet user performance expectations. In environments where funds are not available to constantly upgrade hardware inline with user demand, alternative approaches need to be considered. This paper introduces a ‘Data farming’ model whereby dynamic data, which is ‘grown’ in operational applications, is ‘harvested’ and ‘packaged’ for various consumer markets. Like any well managed agricultural operation, crops are harvested according to historical and perceived demand as inferred by a self-optimising process. This approach aims to make enhanced use of available resources through better utlilisation of system downtime - thereby improving application performance and increasing the availability of key business data.
Resumo:
A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.
Resumo:
Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.
Resumo:
Web sites that rely on databases for their content are now ubiquitous. Query result pages are dynamically generated from these databases in response to user-submitted queries. Automatically extracting structured data from query result pages is a challenging problem, as the structure of the data is not explicitly represented. While humans have shown good intuition in visually understanding data records on a query result page as displayed by a web browser, no existing approach to data record extraction has made full use of this intuition. We propose a novel approach, in which we make use of the common sources of evidence that humans use to understand data records on a displayed query result page. These include structural regularity, and visual and content similarity between data records displayed on a query result page. Based on these observations we propose new techniques that can identify each data record individually, while ignoring noise items, such as navigation bars and adverts. We have implemented these techniques in a software prototype, rExtractor, and tested it using two datasets. Our experimental results show that our approach achieves significantly higher accuracy than previous approaches. Furthermore, it establishes the case for use of vision-based algorithms in the context of data extraction from web sites.
Resumo:
Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.
Resumo:
In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.
Resumo:
Les professionnels de l'information traversent actuellement une période de redéfinition de leur profession provoquée par la transformation de l'information et des processus informationnels vers un mode de plus en plus électronique. Les systèmes d'information Web (SIW) — c'est-à-dire des systèmes d'information basés sur les technologies Web tels que les sites Web externes, les intranets, les systèmes de commerce électronique et les extranets — font partie des technologies à l'origine de ces changements. Ces systèmes sont de plus en plus adoptés par les organisations et, en particulier, par les gouvernements dans leur volonté de devenir électroniques. Le gouvernement fédéral canadien est reconnu comme un des plus innovateurs en matière de SIW et doit adapter son environnement informationnel, dont font partie les professionnels de l'information, à l'introduction de ces systèmes. Malgré l'innovation que les SIW représentent, peu d'études empiriques ont été menées pour identifier quels sont les intervenants nécessaires à leur mise en place. Aucun consensus n'émerge de la littérature quant à la nature de l'intervention des professionnels de l'information dans ces systèmes. Cette recherche vise à accroître les connaissances sur l'intervention des professionnels de l'information dans les SIW. Pour les besoins de cette recherche, les professionnels de l'information sont définis comme les personnes ayant une maîtrise en bibliothéconomie et sciences de l'information ou toute autre formulation équivalente. Cette recherche étudie quatre questions de recherche qui portent sur : (1) les rôles des professionnels de l'information décrits dans les politiques d'information pan-gouvernementales liées aux SIW ainsi que ceux des autres intervenants mentionnés en lien direct avec les SIW, (2) les types de SIW dans lesquels les professionnels de l'information interviennent, (3) les tâches des professionnels de l'information dans ces SIW, et (4) les autres intervenants qui travaillent dans ces systèmes. Une approche qualitative a été utilisée pour répondre à ces questions et implique quatre modes de collecte des données : (1) des entrevues en profondeur en personne avec des professionnels de l'information impliqués dans des SIW, (2) une analyse des SIW où interviennent ces professionnels de l'information, (3) une analyse des politiques pan-gouvernementales liées aux SIW, et (4) la documentation pertinente. Les professionnels de l'information rencontrés proviennent de sept ministères du gouvernement fédéral canadien, ministères retenus pour leur implication dans les SIW. Les résultats indiquent que les professionnels de l'information rencontrés interviennent dans les SIW aux niveaux micro et macro, c'est-à-dire dans des SIW spécifiques ainsi que globalement au niveau de l'ensemble des SIW d'un ministère ou du gouvernement fédéral. Ces professionnels de l'information sont impliqués dans toutes les dimensions et les phases de développement des SIW. Les tâches liées au contenu sont prédominantes mais les tâches technologiques sont aussi très présentes. Trois variables se dégagent de cette étude qui ont un impact sur l'intervention des professionnels de l'information dans les SIW : les types de SIW, les types de postes occupés par les professionnels de l'information et les types de gouvernance.
Resumo:
Affiliation: Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
Resources from the Singapore Summer School 2014 hosted by NUS. ws-summerschool.comp.nus.edu.sg