34 resultados para non profit, linked open data, web scraping, web crawling

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esitys Kirjastoverkkopäivillä 22.10.2014 Helsingissä – Presentation of Jakob Voß at the Library Network Days, October 22, 2014 in Helsinki.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Open data refers to publishing data on the web in machine-readable formats for public access. Using open data, innovative applications can be developed to facilitate people‟s lives. In this thesis, based on the open data cases (discussed in the literature review), Open Data Lappeenranta is suggested, which publishes open data related to opening hours of shops and stores in Lappeenranta City. To prove the possibility of creating Open Data Lappeenranta, the implementation of an open data system is presented in this thesis, which publishes specific data related to shops and stores (including their opening hours) on the web in standard format (JSON). The published open data is used to develop web and mobile applications to demonstrate the benefits of open data in practice. Also, the open data system provides manual and automatic interfaces which make it possible for shops and stores to maintain their own data in the system. Finally in this thesis, the completed version of Open Data Lappeenranta is proposed, which publishes open data related to other fields and businesses in Lappeenranta beyond only stores‟ data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presented the overview of Open Data research area, quantity of evidence and establishes the research evidence based on the Systematic Mapping Study (SMS). There are 621 such publications were identified published between years 2005 and 2014, but only 243 were selected in the review process. This thesis highlights the implications of Open Data principals’ proliferation in the emerging era of the accessibility, reusability and sustainability of data transparency. The findings of mapping study are described in quantitative and qualitative measurement based on the organization affiliation, countries, year of publications, research method, star rating and units of analysis identified. Furthermore, units of analysis were categorized by development lifecycle, linked open data, type of data, technical platforms, organizations, ontology and semantic, adoption and awareness, intermediaries, security and privacy and supply of data which are important component to provide a quality open data applications and services. The results of the mapping study help the organizations (such as academia, government and industries), re-searchers and software developers to understand the existing trend of open data, latest research development and the demand of future research. In addition, the proposed conceptual framework of Open Data research can be adopted and expanded to strengthen and improved current open data applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anders Söderbäckin esitys Kirjastoverkkopäivillä 26.10.2011 Helsingissä.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I denna avhandling analyserar jag både offentliga och privata företag och organisationer, inklusive universitet, särskilt när det existerar potentiell inre motivation. Jag behandlar både industriell produktion, inklusive infrastruktursektorer med vertikala relationer, och tjänstesektorn. Man tänker sig att ägandet kan påverka kostnadseffektiviteten dels genom olika storlek hos lönetillägg och andra förmåner för de anställda (eng. Internal Rent Capture), och dels via asymmetrisk information. Jag frågar dessutom om det finns andra faktorer än ägande och konkurrens som kan påverka prestandan hos kommersiella företag och ideella organisationer. Dessa frågeställningar aktualiseras av pågående reformer inom den offentliga sektorn, särskilt i samband med den så kallade nya offentliga förvaltningen (eng. New Public Management). Jag analyserar reformernas inverkan på hur bra en organisation fungerar och på den sociala välfärden. Analysen i denna avhandling är teoretisk, men resultaten är relaterade också till den empiriska litteraturen. Avhandlingen är uppdelad i del I och II. I del I sammanfattar jag avhandlingen och sätter den i ett sammanhang, medan del II består av fem redan publicerade essäer. De två första (I–II) är mera traditionella, i och med att de baserar sig på homo economicus (eng. the economic man), utan att beakta den inre motivationen. I essä I (publicerad 2008) bedömer vi fördelar och nackdelar av privatisering och avreglering innanför en sådan ram, men med en betoning också på icke återvinningsbara fasta kostnader och vertikala relationer. I essä II (publicerad 2012) fokuserar vi oss på vertikal separation, och konkurrensutsättning och privatisering i nätverksindustrier. I essäerna III–V vidgas perspektivet genom att införa potentiell inre motivation i en agentmodell. Analysen i essä III (publicerad 2014) tillämpas på offentligt ägande och privatisering. I essäerna IV och V (publicerade 2009 respektive 2013) utvidgas analysen till att även gälla kreativa branscher, särskilt arbete inom universiteten, där den inre motivationen hos de anställda kan tänkas vara avgörande. I dessa essäer tillämpas en analys som inbegriper ett intra-personellt spel inom ramen för en agentmodell med potentiell inre motivation. Vi analyserar sålunda avvägningen mellan ekonomiska incitament och inre motivation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014