809 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles
Resumo:
Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.
Resumo:
HydroShare is an online, collaborative system being developed for open sharing of hydrologic data and models. The goal of HydroShare is to enable scientists to easily discover and access hydrologic data and models, retrieve them to their desktop or perform analyses in a distributed computing environment that may include grid, cloud or high performance computing model instances as necessary. Scientists may also publish outcomes (data, results or models) into HydroShare, using the system as a collaboration platform for sharing data, models and analyses. HydroShare is expanding the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated, creating new capability to share models and model components, and taking advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. One of the fundamental concepts in HydroShare is that of a Resource. All content is represented using a Resource Data Model that separates system and science metadata and has elements common to all resources as well as elements specific to the types of resources HydroShare will support. These will include different data types used in the hydrology community and models and workflows that require metadata on execution functionality. The HydroShare web interface and social media functions are being developed using the Drupal content management system. A geospatial visualization and analysis component enables searching, visualizing, and analyzing geographic datasets. The integrated Rule-Oriented Data System (iRODS) is being used to manage federated data content and perform rule-based background actions on data and model resources, including parsing to generate metadata catalog information and the execution of models and workflows. This presentation will introduce the HydroShare functionality developed to date, describe key elements of the Resource Data Model and outline the roadmap for future development.
Resumo:
This article introduces the software program called EthoSeq, which is designed to extract probabilistic behavioral sequences (tree-generated sequences, or TGSs) from observational data and to prepare a TGS-species matrix for phylogenetic analysis. The program uses Graph Theory algorithms to automatically detect behavioral patterns within the observational sessions. It includes filtering tools to adjust the search procedure to user-specified statistical needs. Preliminary analyses of data sets, such as grooming sequences in birds and foraging tactics in spiders, uncover a large number of TGSs which together yield single phylogenetic trees. An example of the use of the program is our analysis of felid grooming sequences, in which we have obtained 1,386 felid grooming TGSs for seven species, resulting in a single phylogeny. These results show that behavior is definitely useful in phylogenetic analysis. EthoSeq simplifies and automates such analyses, uncovers much of the hidden patterns of long behavioral sequences, and prepares this data for further analysis with standard phylogenetic programs. We hope it will encourage many empirical studies on the evolution of behavior.
Resumo:
Mashups are becoming increasingly popular as end users are able to easily access, manipulate, and compose data from several web sources. To support end users, communities are forming around mashup development environments that facilitate sharing code and knowledge. We have observed, however, that end user mashups tend to suffer from several deficiencies, such as inoperable components or references to invalid data sources, and that those deficiencies are often propagated through the rampant reuse in these end user communities. In this work, we identify and specify ten code smells indicative of deficiencies we observed in a sample of 8,051 pipe-like web mashups developed by thousands of end users in the popular Yahoo! Pipes environment. We show through an empirical study that end users generally prefer pipes that lack those smells, and then present eleven specialized refactorings that we designed to target and remove the smells. Our refactorings reduce the complexity of pipes, increase their abstraction, update broken sources of data and dated components, and standardize pipes to fit the community development patterns. Our assessment on the sample of mashups shows that smells are present in 81% of the pipes, and that the proposed refactorings can reduce that number to 16%, illustrating the potential of refactoring to support thousands of end users developing pipe-like mashups.
Resumo:
This work is concerned with the increasing relationships between two distinct multidisciplinary research fields, Semantic Web technologies and scholarly publishing, that in this context converge into one precise research topic: Semantic Publishing. In the spirit of the original aim of Semantic Publishing, i.e. the improvement of scientific communication by means of semantic technologies, this thesis proposes theories, formalisms and applications for opening up semantic publishing to an effective interaction between scholarly documents (e.g., journal articles) and their related semantic and formal descriptions. In fact, the main aim of this work is to increase the users' comprehension of documents and to allow document enrichment, discovery and linkage to document-related resources and contexts, such as other articles and raw scientific data. In order to achieve these goals, this thesis investigates and proposes solutions for three of the main issues that semantic publishing promises to address, namely: the need of tools for linking document text to a formal representation of its meaning, the lack of complete metadata schemas for describing documents according to the publishing vocabulary, and absence of effective user interfaces for easily acting on semantic publishing models and theories.
Resumo:
Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.
Resumo:
For smart cities applications, a key requirement is to disseminate data collected from both scalar and multimedia wireless sensor networks to thousands of end-users. Furthermore, the information must be delivered to non-specialist users in a simple, intuitive and transparent manner. In this context, we present Sensor4Cities, a user-friendly tool that enables data dissemination to large audiences, by using using social networks, or/and web pages. The user can request and receive monitored information by using social networks, e.g., Twitter and Facebook, due to their popularity, user-friendly interfaces and easy dissemination. Additionally, the user can collect or share information from smart cities services, by using web pages, which also include a mobile version for smartphones. Finally, the tool could be configured to periodically monitor the environmental conditions, specific behaviors or abnormal events, and notify users in an asynchronous manner. Sensor4Cities improves the data delivery for individuals or groups of users of smart cities applications and encourages the development of new user-friendly services.
Resumo:
BACKGROUND The majority of radiological reports are lacking a standard structure. Even within a specialized area of radiology, each report has its individual structure with regards to details and order, often containing too much of non-relevant information the referring physician is not interested in. For gathering relevant clinical key parameters in an efficient way or to support long-term therapy monitoring, structured reporting might be advantageous. OBJECTIVE Despite of new technologies in medical information systems, medical reporting is still not dynamic. To improve the quality of communication in radiology reports, a new structured reporting system was developed for abdominal aortic aneurysms (AAA), intended to enhance professional communication by providing the pertinent clinical information in a predefined standard. METHODS Actual state analysis was performed within the departments of radiology and vascular surgery by developing a Technology Acceptance Model. The SWOT (strengths, weaknesses, opportunities, and threats) analysis focused on optimization of the radiology reporting of patients with AAA. Definition of clinical parameters was achieved by interviewing experienced clinicians in radiology and vascular surgery. For evaluation, a focus group (4 radiologists) looked at the reports of 16 patients. The usability and reliability of the method was validated in a real-world test environment in the field of radiology. RESULTS A Web-based application for radiological "structured reporting" (SR) was successfully standardized for AAA. Its organization comprises three main categories: characteristics of pathology and adjacent anatomy, measurements, and additional findings. Using different graphical widgets (eg, drop-down menus) in each category facilitate predefined data entries. Measurement parameters shown in a diagram can be defined for clinical monitoring and be adducted for quick adjudications. Figures for optional use to guide and standardize the reporting are embedded. Analysis of variance shows decreased average time required with SR to obtain a radiological report compared to free-text reporting (P=.0001). Questionnaire responses confirm a high acceptance rate by the user. CONCLUSIONS The new SR system may support efficient radiological reporting for initial diagnosis and follow-up for AAA. Perceived advantages of our SR platform are ease of use, which may lead to more accurate decision support. The new system is open to communicate not only with clinical partners but also with Radiology Information and Hospital Information Systems.
Resumo:
A vast amount of temporal information is provided on the Web. Even though many facts expressed in documents are time-related, the temporal properties of Web presentations have not received much attention. In database research, temporal databases have become a mainstream topic in recent years. In Web documents, temporal data may exist as meta data in the header and as user-directed data in the body of a document. Whereas temporal data can easily be identified in the semi-structured meta data, it is more difficult to determine temporal data and its role in the body. We propose procedures for maintaining temporal integrity of Web pages and outline different approaches of applying bitemporal data concepts for Web documents. In particular, we regard desirable functionalities of Web repositories and other Web-related tools that may support the Webmasters in managing the temporal data of their Web documents. Some properties of a prototype environment are described.
Resumo:
Recently, the Semantic Web has experienced signi�cant advancements in standards and techniques, as well as in the amount of semantic information available online. Even so, mechanisms are still needed to automatically reconcile semantic information when it is expressed in di�erent natural languages, so that access to Web information across language barriers can be improved. That requires developing techniques for discovering and representing cross-lingual links on the Web of Data. In this paper we explore the different dimensions of such a problem and reflect on possible avenues of research on that topic.
Resumo:
Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.
Resumo:
Semantic Sensor Web infrastructures use ontology-based models to represent the data that they manage; however, up to now, these ontological models do not allow representing all the characteristics of distributed, heterogeneous, and web-accessible sensor data. This paper describes a core ontological model for Semantic Sensor Web infrastructures that covers these characteristics and that has been built with a focus on reusability. This ontological model is composed of different modules that deal, on the one hand, with infrastructure data and, on the other hand, with data from a specific domain, that is, the coastal flood emergency planning domain. The paper also presents a set of guidelines, followed during the ontological model development, to satisfy a common set of requirements related to modelling domain-specific features of interest and properties. In addition, the paper includes the results obtained after an exhaustive evaluation of the developed ontologies along different aspects (i.e., vocabulary, syntax, structure, semantics, representation, and context).
Resumo:
Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.