964 resultados para non profit, linked open data, web scraping, web crawling
Resumo:
The W3C Best Practises for Multilingual Linked Open Data community group was born one year ago during the last MLW workshop in Rome. Nowadays, it continues leading the effort of a numerous community towards acquiring a shared view of the issues caused by multilingualism on the Web of Data and their possible solutions. Despite our initial optimism, we found the task of identifying best practises for ML-LOD a difficult one, requiring a deep understanding of the Web of Data in its multilingual dimension and in its practical problems. In this talk we will review the progresses of the group so far, mainly in the identification and analysis of topics, use cases, and design patterns, as well as the future challenges.
Resumo:
A Internet possui inúmeros tipos de documentos e é uma influente fonte de informação.O conteúdo Web é projetado para os seres humanos interpretarem e não para as máquinas.Os sistemas de busca tradicionais são imprecisos na recuperação de informações. Ogoverno utiliza e disponibiliza documentos na Web para que os cidadãos e seus própriossetores organizacionais os utilizem, porém carece de ferramentas que apoiem na tarefa darecuperação desses documentos. Como exemplo, podemos citar a Plataforma de CurrículosLattes administrada pelo Cnpq.A Web semântica possui a finalidade de otimizar a recuperação dos documentos, ondeesses recebem significados, permitindo que tanto as pessoas quanto as máquinas possamcompreender o significado de uma informação. A falta de semântica em nossos documentos,resultam em pesquisas ineficazes, com informações divergentes e ambíguas. Aanotação semântica é o caminho para promover a semântica em documentos.O objetivo da dissertação é montar um arcabouço com os conceitos da Web Semânticaque possibilite anotar automaticamente o Currículo Lattes por meio de bases de dadosabertas (Linked Open Data), as quais armazenam o significado de termos e expressões.O problema da pesquisa está baseado em saber quais são os conceitos associados à WebSemântica que podem contribuir para a Anotação Semântica Automática do CurrículoLattes utilizando o Linked Open Data (LOD)?Na Revisão Sistemática da Literatura foi apresentado conceitos (anotação manual, automática,semi-automática, anotação intrusiva...), ferramentas (Extrator de Entidade...)e tecnologias (RDF, RDFa, SPARQL..) relativas ao tema. A aplicação desses conceitosoportunizou a criação do Sistema Lattes Web Semântico. O sistema possibilita a importaçãodo currículo XML da Plataforma Lattes, efetua a anotação automática dos dadosdisponibilizados utilizando as bases de dados abertas e possibilita efetuar consultas semânticas.A validação do sistema é realizada com a apresentação de currículos anotados e a realizaçãode consultas utilizando dados externos pertencentes ao LOD. Por fim é apresentado asconclusões, dificuldades encontradas e proposta de trabalhos futuros.
Resumo:
POSTDATA is a 5 year's European Research Council (ERC) Starting Grant Project that started in May 2016 and is hosted by the Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain. The context of the project is the corpora of European Poetry (EP), with a special focus on poetic materials from different languages and literary traditions. POSTDATA aims to offer a standardized model in the philological field and a metadata application profile (MAP) for EP in order to build a common classification of all these poetic materials. The information of Spanish, Italian and French repertoires will be published in the Linked Open Data (LOD) ecosystem. Later we expect to extend the model to include additional corpora. There are a number of Web Based Information Systems in Europe with repertoires of poems available to human consumption but not in an appropriate condition to be accessible and reusable by the Semantic Web. These systems are not interoperable; they are in fact locked in their databases and proprietary software, not suitable to be linked in the Semantic Web. A way to make this data interoperable is to develop a MAP in order to be able to publish this data available in the LOD ecosystem, and also to publish new data that will be created and modeled based on this MAP. To create a common data model for EP is not simple since the existent data models are based on conceptualizations and terminology belonging to their own poetical traditions and each tradition has developed an idiosyncratic analytical terminology in a different and independent way for years. The result of this uncoordinated evolution is a set of varied terminologies to explain analogous metrical phenomena through the different poetic systems whose correspondences have been hardly studied – see examples in González-Blanco & Rodríguez (2014a and b). This work has to be done by domain experts before the modeling actually starts. On the other hand, the development of a MAP is a complex task though it is imperative to follow a method for this development. The last years Curado Malta & Baptista (2012, 2013a, 2013b) have been studying the development of MAP's in a Design Science Research (DSR) methodological process in order to define a method for the development of MAPs (see Curado Malta (2014)). The output of this DSR process was a first version of a method for the development of Metadata Application Profiles (Me4MAP) (paper to be published). The DSR process is now in the validation phase of the Relevance Cycle to validate Me4MAP. The development of this MAP for poetry will follow the guidelines of Me4MAP and this development will be used to do the validation of Me4MAP. The final goal of the POSTDATA project is: i) to be able to publish all the data locked in the WIS, in LOD, where any agent interested will be able to build applications over the data in order to serve final users; ii) to build a Web platform where: a) researchers, students and other final users interested in EP will be able to access poems (and their analyses) of all databases; b) researchers, students and other final users will be able to upload poems, the digitalized images of manuscripts, and fill in the information concerning the analysis of the poem, collaboratively contributing to a LOD dataset of poetry.
Resumo:
Studio ed analisi delle principali tecniche in ambito di Social Data Analysis. Progettazione e Realizzazione di una soluzione software implementata con linguaggio Java in ambiente Eclipse. Il software realizzato permette di integrare differenti servizi di API REST, per l'estrazione di dati sociali da Twitter, la loro memorizzazione in un database non-relazionale (realizzato con MongoDB), e la loro gestione. Inoltre permette di effettuare operazioni di classificazione di topic, e di analizzare dati complessivi sulle collection di dati estratti. Infine permette di visualizzare un albero delle "ricondivisioni", partendo da singoli tweet selezionati, ed una mappa geo-localizzata, contenente gli utenti coinvolti nella catena di ricondivisioni, e i relativi archi di "retweet".
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation to ARMA2013 (Association of Research Managers and Administrators)
Resumo:
Il documento presenta il caso dell'archivio fotografico della Fondazione Zeri come caso reale di conversione di un catalogo ricco di informazioni ma povero di interconnessioni nel dominio dei Linked Open Data, basandosi sull'ontologia CIDOC-CRM per il cultural heritage.
Resumo:
La pubblicazione si incentra sulla descrizione di un programma generico di disambiguazione di IRI e letterali, in Linked Open Data, fortemente configurabile, quindi applicabile in più contesti. CALID è la sigla di "Customizable Application for Literal and IRI's Disambiguation". Esso è stato creato per risolvere la disambiguazione degli autori di pubblicazioni scientifiche, e in questo articolo viene descritta la parte progettuale, il modo in cui si utilizza e i valori di performance e precisione ottenuti testandolo su diversi datasets.
Resumo:
La tesi ha lo scopo di introdurre Investiga, un'applicazione per l'estrazione automatica di informazioni da articoli scientifici in formato PDF e pubblicazione di queste informazioni secondo i principi e i formati Linked Open Data, creata per la tesi. Questa applicazione è basata sul Task 2 della SemPub 2016, una challenge che ha come scopo principale quello di migliorare l'estrazione di informazioni da articoli scientifici in formato PDF. Investiga estrae i capitoli di primo livello, le didascalie delle figure e delle tabelle da un dato articolo e crea un grafo delle informazioni così estratte collegate adeguatamente tra loro. La tesi inoltre analizza gli strumenti esistenti per l'estrazione automatica di informazioni da documenti PDF e dei loro limiti.
Resumo:
Open data refers to publishing data on the web in machine-readable formats for public access. Using open data, innovative applications can be developed to facilitate people‟s lives. In this thesis, based on the open data cases (discussed in the literature review), Open Data Lappeenranta is suggested, which publishes open data related to opening hours of shops and stores in Lappeenranta City. To prove the possibility of creating Open Data Lappeenranta, the implementation of an open data system is presented in this thesis, which publishes specific data related to shops and stores (including their opening hours) on the web in standard format (JSON). The published open data is used to develop web and mobile applications to demonstrate the benefits of open data in practice. Also, the open data system provides manual and automatic interfaces which make it possible for shops and stores to maintain their own data in the system. Finally in this thesis, the completed version of Open Data Lappeenranta is proposed, which publishes open data related to other fields and businesses in Lappeenranta beyond only stores‟ data.
Resumo:
In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources.
Resumo:
This thesis presented the overview of Open Data research area, quantity of evidence and establishes the research evidence based on the Systematic Mapping Study (SMS). There are 621 such publications were identified published between years 2005 and 2014, but only 243 were selected in the review process. This thesis highlights the implications of Open Data principals’ proliferation in the emerging era of the accessibility, reusability and sustainability of data transparency. The findings of mapping study are described in quantitative and qualitative measurement based on the organization affiliation, countries, year of publications, research method, star rating and units of analysis identified. Furthermore, units of analysis were categorized by development lifecycle, linked open data, type of data, technical platforms, organizations, ontology and semantic, adoption and awareness, intermediaries, security and privacy and supply of data which are important component to provide a quality open data applications and services. The results of the mapping study help the organizations (such as academia, government and industries), re-searchers and software developers to understand the existing trend of open data, latest research development and the demand of future research. In addition, the proposed conceptual framework of Open Data research can be adopted and expanded to strengthen and improved current open data applications.