975 resultados para Web resources
Resumo:
In this paper, three approaches for the assessment of credibility in a web environment will be presented, namely the checklists model, the cognitive authority model and the contextual model. This theoretical framework was used to conduct a study about the assessment of web resources credibility among a sample of 195 students, from elementary and secondary schools in a municipality in Oporto district (Portugal). The practices that young people and children claim to use regarding the use of criteria for web resources selection will be presented. In addition, these results will be discussed and compared with the perceptions that these respondents have demonstrated for the use of criteria to establish or assess authorship, originality, or information resources structure. These results will be also discussed and compared with the perceptions that these respondents have demonstrated for the elements that make up each of these criteria.
Resumo:
Inside this Issue: Dean’s FarewellMilestonesInfo-MagicActive PeopleFriends’ ProgramsExcellence
Resumo:
In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies
Resumo:
This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one.
Resumo:
A location-based search engine must be able to find and assign proper locations to Web resources. Host, content and metadata location information are not sufficient to describe the location of resources as they are ambiguous or unavailable for many documents. We introduce target location as the location of users of Web resources. Target location is content-independent and can be applied to all types of Web resources. A novel method is introduced which uses log files and IN to track the visitors of websites. The experiments show that target location can be calculated for almost all documents on the Web at country level and to the majority of them in state and city levels. It can be assigned to Web resources as a new definition and dimension of location. It can be used separately or with other relevant locations to define the geography of Web resources. This compensates insufficient geographical information on Web resources and would facilitate the design and development of location-based search engines.
Resumo:
In this paper the key features of a two-layered model for describing the semantic of dynamical web resources are introduced. In the current Semantic Web proposal [Berners-Lee et al., 2001] web resources are classified into static ontologies which describes the semantic network of their inter-relationships [Kalianpur, 2001][Handschuh & Staab, 2002] and complex constraints described by logical quantified formula [Boley et al., 2001][McGuinnes & van Harmelen, 2004][McGuinnes et al., 2004], the basic idea is that software agents can use techniques of automatic reasoning in order to relate resources and to support sophisticated web application. On the other hand, web resources are also characterized by their dynamical aspects, which are not adequately addressed by current web models. Resources on the web are dynamical since, in the minimal case, they can appear or disappear from the web and their content is upgraded. In addition, resources can traverse different states, which characterized the resource life-cycle, each resource state corresponding to different possible uses of the resource. Finally most resources are timed, i.e. they information they provide make sense only if contextualised with respect to time, and their validity and accuracy is greatly bounded by time. Temporal projection and deduction based on dynamical and time constraints of the resources can be made and exploited by software agents [Hendler, 2001] in order to make previsions about the availability and the state of a resource, for deciding when consulting the resource itself or in order to deliberately induce a resource state change for reaching some agent goal, such as in the automated planning framework [Fikes & Nilsson, 1971][Bacchus & Kabanza,1998].
Resumo:
Apresentam-se os resultados parcelares de um estudo destinado a promover um melhor conhecimento das estratégias que os jovens em idade escolar (12-18 anos) consideram relevantes para avaliar as fontes de informação disponíveis na Internet. Para o efeito, foi aplicado um inquérito distribuído a uma amostra de 195 alunos de uma escola do 3o ciclo e outra do ensino secundário de um concelho do distrito do Porto. São apresentados e discutidos os resultados acerca da perceção destes alunos quanto aos critérios a aplicar na avaliação das fontes de informação disponíveis na Internet, na vertente da credibilidade. Serão apresenta- das as práticas que os jovens declaram ter relativamente ao uso de critérios de autoria, originalidade, estrutura, atualidade e de comparação para avaliar a credibilidade das fontes de informação. Em complemento, estes resultados serão comparados e discutidos com as perceções que os mesmos inquiridos demonstram possuir relativamente aos elementos que compõem cada um destes critérios. A análise dos dados obtidos é enquadrada e sustentada numa revisão da literatura acerca do conceito de credibilidade, aplicado às fontes de informação disponíveis na Internet. São ainda abordados alguns tópicos relaciona- dos com a inclusão de estratégias de avaliação da credibilidade da informação digital no modelo Big6, um dos modelos de desenvolvimento de competências de literacia da informação mais conhecidos e utilizados nas bibliotecas escolares portuguesas.
Resumo:
BACKGROUND. Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. RESULTS. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. CONCLUSIONS. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
Resumo:
Semantic Web applications take off is being slower than expected, at least with respect to “real-world” applications and users. One of the main reasons for this lack of adoption is that most Semantic Web user interfaces are still immature from the usability and accessibility points of view. This is due to the novelty of these technologies, but this also motivates the exploration of alternative interaction paradigms, different from the “traditional” Web or Desktop applications ones. Our proposal is realized in the Rhizomer platform, which explores the possibilities of the object–action interaction paradigm at the Web scale. This paradigm is well suited for heterogeneous resource spaces such as those common in the Semantic Web. Resources, described by metadata, correspond to the objects in the paradigm. Semantic web services, which are dynamically associated to these objects, correspond to the actions. The platform is being put into practice in the context of a research project in order to build an open application for media distribution based on Semantic Web technologies. Moreover, its usability and accessibility have been evaluated in this real setting and compared to similar systems.
Resumo:
Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: an increasing number of researchers is working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The Semantic Web is the second-generation WWW, enriched by machine-processable information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.
Resumo:
Lecture 6: RDF and Metadata Lecture slides and exercises for using RDF to describe Web resources
Resumo:
Pós-graduação em Ciência da Informação - FFC