867 resultados para Web data


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Informação - FFC

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Informação - FFC

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The web is continuously evolving into a collection of many data, which results in the interest to collect and merge these data in a meaningful way. Based on that web data, this paper describes the building of an ontology resting on fuzzy clustering techniques. Through continual harvesting folksonomies by web agents, an entire automatic fuzzy grassroots ontology is built. This self-updating ontology can then be used for several practical applications in fields such as web structuring, web searching and web knowledge visualization.A potential application for online reputation analysis, added value and possible future studies are discussed in the conclusion.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent developments in service-oriented and distributed computing have created exciting opportunities for the integration of models in service chains to create the Model Web. This offers the potential for orchestrating web data and processing services, in complex chains; a flexible approach which exploits the increased access to products and tools, and the scalability offered by the Web. However, the uncertainty inherent in data and models must be quantified and communicated in an interoperable way, in order for its effects to be effectively assessed as errors propagate through complex automated model chains. We describe a proposed set of tools for handling, characterizing and communicating uncertainty in this context, and show how they can be used to 'uncertainty- enable' Web Services in a model chain. An example implementation is presented, which combines environmental and publicly-contributed data to produce estimates of sea-level air pressure, with estimates of uncertainty which incorporate the effects of model approximation as well as the uncertainty inherent in the observational and derived data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Peer reviewed

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The role of sustainability in urban design is becoming increasingly important as Australia’s cities continue to grow, putting pressure on existing infrastructure such as water, energy and transport. To optimise an urban design many different aspects such as water, energy, transport, costs need to be taken into account integrally. Integrated software applications assessing urban designs on a large variety of aspects are hardly available. With the upcoming next generation of the Internet often referred to as the Semantic Web, data can become more machine-interpretable by developing ontologies that can support the development of integrated software systems. Software systems can use these ontologies to perform an intelligent task such as assessing an urban design on a particular aspect. When ontologies of different applications are aligned, they can share information resulting in interoperability. Inference such as compliancy checks and classifications can support aligning the ontologies. A proof of concept implementation has been made to demonstrate and validate the usefulness of machine interpretable ontologies for urban designs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the explosion of Web 2.0 application such as blogs, social and professional networks, and various other types of social media, the rich online information and various new sources of knowledge flood users and hence pose a great challenge in terms of information overload. It is critical to use intelligent agent software systems to assist users in finding the right information from an abundance of Web data. Recommender systems can help users deal with information overload problem efficiently by suggesting items (e.g., information and products) that match users’ personal interests. The recommender technology has been successfully employed in many applications such as recommending films, music, books, etc. The purpose of this report is to give an overview of existing technologies for building personalized recommender systems in social networking environment, to propose a research direction for addressing user profiling and cold start problems by exploiting user-generated content newly available in Web 2.0.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Web data can often be represented in free tree form; however, free tree mining methods seldom exist. In this paper, a computationally fast algorithm FreeS is presented to discover all frequently occurring free subtrees in a database of labelled free trees. FreeS is designed using an optimal canonical form, BOCF that can uniquely represent free trees even during the presence of isomorphism. To avoid enumeration of false positive candidates, it utilises the enumeration approach based on a tree-structure guided scheme. This paper presents lemmas that introduce conditions to conform the generation of free tree candidates during enumeration. Empirical study using both real and synthetic datasets shows that FreeS is scalable and significantly outperforms (i.e. few orders of magnitude faster than) the state-of-the-art frequent free tree mining algorithms, HybridTreeMiner and FreeTreeMiner.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

When authors of scholarly articles decide where to submit their manuscripts for peer review and eventual publication, they often base their choice of journals on very incomplete information abouthow well the journals serve the authors’ purposes of informing about their research and advancing their academic careers. The purpose of this study was to develop and test a new method for benchmarking scientific journals, providing more information to prospective authors. The method estimates a number of journal parameters, including readership, scientific prestige, time from submission to publication, acceptance rate and service provided by the journal during the review and publication process. Data directly obtainable from the web, data that can be calculated from such data, data obtained from publishers and editors, and data obtained using surveys with authors are used in the method, which has been tested on three different sets of journals, each from a different discipline. We found a number of problems with the different data acquisition methods, which limit the extent to which the method can be used. Publishers and editors are reluctant to disclose important information they have at hand (i.e. journal circulation, web downloads, acceptance rate). The calculation of some important parameters (for instance average time from submission to publication, regional spread of authorship) can be done but requires quite a lot of work. It can be difficult to get reasonable response rates to surveys with authors. All in all we believe that the method we propose, taking a “service to authors” perspective as a basis for benchmarking scientific journals, is useful and can provide information that is valuable to prospective authors in selected scientific disciplines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: Abundant evidence shows that regular physical activity (PA) is an effective strategy for preventing obesity in people of diverse socioeconomic status (SES) and racial groups. The proportion of PA performed in parks and how this differs by proximate neighborhood SES has not been thoroughly investigated. The present project analyzes online public web data feeds to assess differences in outdoor PA by neighborhood SES in St. Louis, MO, USA.
Methods: First, running and walking routes submitted by users of the website MapMyRun.com were downloaded. The website enables participants to plan, map, record, and share their exercise routes and outdoor activities like runs, walks, and hikes in an online database. Next, the routes were visually illustrated using geographic information systems. Thereafter, using park data and 2010 Missouri census poverty data, the odds of running and walking routes traversing a low-SES neighborhood, and traversing a park in a low-SES neighborhood were examined in comparison to the odds of routes traversing higher-SES neighborhoods and higher-SES parks.
Results: Results show that a majority of running and walking routes occur in or at least traverse through a park. However, this finding does not hold when comparing low-SES neighborhoods to higher-SES neighborhoods in St. Louis. The odds of running in a park in a low-SES neighborhood were 54% lower than running in a park in a higher-SES neighborhood (OR = 0.46, CI = 0.17-1.23). The odds of walking in a park in a low-SES neighborhood were 17% lower than walking in a park in a higher-SES neighborhood (OR = 0.83, CI = 0.26-2.61).
Conclusion: The novel methods of this study include the use of inexpensive, unobtrusive, and publicly available web data feeds to examine PA in parks and differences by neighborhood SES. Emerging technologies like MapMyRun.com present significant advantages to enhance tracking of user-defined PA across large geographic and temporal settings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives To report methodology and overall clinical, laboratory and radiographic characteristics for Henoch-Schonlein purpura (HSP), childhood polyarteritis nodosa (c-PAN), c-Wegener granulomatosis (c-WG) and c-Takayasu arteritis (c-TA) classification criteria.Methods The preliminary Vienna 2005 consensus conference, which proposed preliminary criteria for paediatric vasculitides, was followed by a EULAR/PRINTO/PRES-supported validation project divided into three main steps. Step 1: retrospective/prospective web-data collection for HSP, c-PAN, c-WG and c-TA, with age at diagnosis <= 18 years. Step 2: blinded classification by consensus panel of a subgroup of 280 cases (128 difficult cases, 152 randomly selected) enabling expert diagnostic verification. Step 3: Ankara 2008 Consensus Conference and statistical evaluation (sensitivity, specificity, area under the curve, kappa-agreement) using as 'gold standard' the final consensus classification or original treating physician diagnosis.Results A total of 1183/1398 (85%) samples collected were available for analysis: 827 HSP, 150 c-PAN, 60 c-WG, 87 c-TA and 59 c-other. Prevalence, signs/symptoms, laboratory, biopsy and imaging reports were consistent with the clinical picture of the four c-vasculitides. A representative subgroup of 280 patients was blinded to the treating physician diagnosis and classified by a consensus panel, with kappa-agreement of 0.96 for HSP (95% CI 0.84 to 1), 0.88 for c-WG (95% CI 0.76 to 0.99), 0.84 for c-TA (95% CI 0.73 to 0.96) and 0.73 for c-PAN (95% CI 0.62 to 0.84), with an overall. of 0.79 (95% CI 0.73 to 0.84).Conclusion EULAR/PRINTO/PRES propose validated classification criteria for HSP, c-PAN, c-WG and c-TA, with substantial/almost perfect agreement with the final consensus classification or original treating physician diagnosis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online reputation management deals with monitoring and influencing the online record of a person, an organization or a product. The Social Web offers increasingly simple ways to publish and disseminate personal or opinionated information, which can rapidly have a disastrous influence on the online reputation of some of the entities. This dissertation can be split into three parts: In the first part, possible fuzzy clustering applications for the Social Semantic Web are investigated. The second part explores promising Social Semantic Web elements for organizational applications,while in the third part the former two parts are brought together and a fuzzy online reputation analysis framework is introduced and evaluated. Theentire PhD thesis is based on literature reviews as well as on argumentative-deductive analyses.The possible applications of Social Semantic Web elements within organizations have been researched using a scenario and an additional case study together with two ancillary case studies—based on qualitative interviews. For the conception and implementation of the online reputation analysis application, a conceptual framework was developed. Employing test installations and prototyping, the essential parts of the framework have been implemented.By following a design sciences research approach, this PhD has created two artifacts: a frameworkand a prototype as proof of concept. Bothartifactshinge on twocoreelements: a (cluster analysis-based) translation of tags used in the Social Web to a computer-understandable fuzzy grassroots ontology for the Semantic Web, and a (Topic Maps-based) knowledge representation system, which facilitates a natural interaction with the fuzzy grassroots ontology. This is beneficial to the identification of unknown but essential Web data that could not be realized through conventional online reputation analysis. Theinherent structure of natural language supports humans not only in communication but also in the perception of the world. Fuzziness is a promising tool for transforming those human perceptions intocomputer artifacts. Through fuzzy grassroots ontologies, the Social Semantic Web becomes more naturally and thus can streamline online reputation management.