978 resultados para Web archiving approaches


Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’arxivament del web és una disciplina que te el seu origen en el camp de la biblioteconomia i les ciències de la informació i és aliena al món arxivístic del nostre país. La primera part del present treball ofereix un breu estat de la qüestió sobre l’arxivament de les pàgines web i, des d’una perspectiva arxivística, intentarà donar resposta a qüestions com en què consisteix l’arxivament de les pàgines web? Per a què serveix? Des de quan es practica? Quines organitzacions el practiquen? Com es captura i emmagatzema el web? En la segona part es proposa una reflexió sobre l’aplicació de l’arxivament web des de la disciplina arxivística. Paraules clau: Preservació digital, arxivament web, arxivística, Internet, Biblioteques Nacionals, documents electrònics, tecnologies de la informació i la comunicació

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This report describes web archiving in the National Library of Finland. The National Library of Finland has been archiving Finnish web on a regular basis since 2006. Web archiving is an important part of the Library'ʹs endeavours to collect and preserve Finnish published cultural heritage. In 2010, the amount of harvested data was 200 million files, or 25 Terabytes. The report takes the reader through the relevant legislation; internal plans and policies; funding and their allocation; the practices of web archiving; arrangements for the use of the archive; and issues rising from data security, sensitive materials, &c.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Le Web représente actuellement un espace privilégié d’expression et d’activité pour plusieurs communautés, où pratiques communicationnelles et pratiques documentaires s’enrichissent mutuellement. Dans sa dimension visible ou invisible, le Web constitue aussi un réservoir documentaire planétaire caractérisé non seulement par l’abondance de l’information qui y circule, mais aussi par sa diversité, sa complexité et son caractère éphémère. Les projets d’archivage du Web en cours abordent pour beaucoup cette question du point de vue de la préservation des publications en ligne sans la considérer dans une perspective archivistique. Seuls quelques projets d’archivage du Web visent la préservation du Web organisationnel ou gouvernemental. La valeur archivistique du Web, notamment du Web organisationnel, ne semble pas être reconnue malgré un effort soutenu de certaines archives nationales à diffuser des politiques d’archivage du Web organisationnel. La présente thèse a pour but de développer une meilleure compréhension de la nature des archives Web et de documenter les pratiques actuelles d’archivage du Web organisationnel. Plus précisément, cette recherche vise à répondre aux trois questions suivantes : (1) Que recommandent en général les politiques d’archivage du Web organisationnel? (2) Quelles sont les principales caractéristiques des archives Web? (3) Quelles pratiques d’archivage du Web organisationnel sont mises en place dans des organisations au Québec? Pour répondre à ces questions, cette recherche exploratoire et descriptive a adopté une approche qualitative basée sur trois modes de collecte des données, à savoir : l’analyse d’un corpus de 55 politiques et documents complémentaires relatifs à l’archivage du Web organisationnel; l’observation de 11 sites Web publics d’organismes au Québec de même que l’observation d’un échantillon de 737 documents produits par ces systèmes Web; et, enfin, des entrevues avec 21 participants impliqués dans la gestion et l’archivage de ces sites Web. Les résultats de recherche démontrent que les sites Web étudiés sont le produit de la conduite des activités en ligne d’une organisation et documentent, en même temps, les objectifs et les manifestations de sa présence sur le Web. De nouveaux types de documents propres au Web organisationnel ont pu être identifiés. Les documents qui ont migré sur le Web ont acquis un autre contexte d’usage et de nouvelles caractéristiques. Les méthodes de gestion actuelles doivent prendre en considération les propriétés des documents dans un environnement Web. Alors que certains sites d’étude n’archivent pas leur site Web public, d’autres s’y investissent. Toutefois les choix établis ne correspondent pas toujours aux recommandations proposées dans les politiques d’archivage du Web analysées et ne garantissent pas la pérennité des archives Web ni leur exploitabilité à long terme. Ce constat nous a amenée à proposer une politique type adaptée aux caractéristiques des archives Web. Ce modèle décrit les composantes essentielles d’une politique pour l’archivage des sites Web ainsi qu’un éventail des mesures que pourrait mettre en place l’organisation en fonction des résultats d’une analyse des risques associés à l’usage de son site Web public dans la conduite de ses affaires.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current model-driven Web Engineering approaches (such as OO-H, UWE or WebML) provide a set of methods and supporting tools for a systematic design and development of Web applications. Each method addresses different concerns using separate models (content, navigation, presentation, business logic, etc.), and provide model compilers that produce most of the logic and Web pages of the application from these models. However, these proposals also have some limitations, especially for exchanging models or representing further modeling concerns, such as architectural styles, technology independence, or distribution. A possible solution to these issues is provided by making model-driven Web Engineering proposals interoperate, being able to complement each other, and to exchange models between the different tools. MDWEnet is a recent initiative started by a small group of researchers working on model-driven Web Engineering (MDWE). Its goal is to improve current practices and tools for the model-driven development of Web applications for better interoperability. The proposal is based on the strengths of current model-driven Web Engineering methods, and the existing experience and knowledge in the field. This paper presents the background, motivation, scope, and objectives of MDWEnet. Furthermore, it reports on the MDWEnet results and achievements so far, and its future plan of actions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Information and content integration are believed to be a possible solution to the problem of information overload in the Internet. The article is an overview of a simple solution for integration of information and content on the Web. Previous approaches to content extraction and integration are discussed, followed by introduction of a novel technology to deal with the problems, based on XML processing. The article includes lessons learned from solving issues of changing webpage layout, incompatibility with HTML standards and multiplicity of the results returned. The method adopting relative XPath queries over DOM tree proves to be more robust than previous approaches to Web information integration. Furthermore, the prototype implementation demonstrates the simplicity that enables non-professional users to easily adopt this approach in their day-to-day information management routines.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Presentation at the IIPC General Assembly, Reykjavik, 12 April, 2016

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This poster presentation from the May 2015 Florida Library Association Conference, along with the Everglades Explorer discovery portal at http://ee.fiu.edu, demonstrates how traditional bibliographic and curatorial principles can be applied to: 1) selection, cross-walking and aggregation of metadata linking end-users to wide-spread digital resources from multiple silos; 2) harvesting of select PDFs, HTML and media for web archiving and access; 3) selection of CMS domains, sub-domains and folders for targeted searching using an API. Choosing content for this discovery portal is comparable to past scholarly practice of creating and publishing subject bibliographies, except metadata and data are housed in relational databases. This new and yet traditional capacity coincides with: Growth of bibliographic utilities (MarcEdit); Evolution of open-source discovery systems (eXtensible Catalog); Development of target-capable web crawling and archiving systems (Archive-it); and specialized search APIs (Google). At the same time, historical and technical changes – specifically the increasing fluidity and re-purposing of syndicated metadata – make this possible. It equally stems from the expansion of freely accessible digitized legacy and born-digital resources. Innovation principles helped frame the process by which the thematic Everglades discovery portal was created at Florida International University. The path -- to providing for more effective searching and co-location of digital scientific, educational and historical material related to the Everglades -- is contextualized through five concepts found within Dyer and Christensen’s “The Innovator’s DNA: Mastering the five skills of disruptive innovators (2011). The project also aligns with Ranganathan’s Laws of Library Science, especially the 4th Law -- to "save the time of the user.”

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent studies of mobile Web trends show a continuous explosion of mobile-friendly content. However, the increasing number and heterogeneity of mobile devices poses several challenges for Web programmers who want to automatically get the delivery context and adapt the content to mobile devices. In this process, the devices detection phase assumes an important role where an inaccurate detection could result in a poor mobile experience for the enduser. In this paper we compare the most promising approaches for mobile device detection. Based on this study, we present an architecture for a system to detect and deliver uniform m-Learning content to students in a Higher School. We focus mainly on the devices capabilities repository manageable and accessible through an API. We detail the structure of the capabilities XML Schema that formalizes the data within the devices capabilities XML repository and the REST Web Service API for selecting the correspondent devices capabilities data according to a specific request. Finally, we validate our approach by presenting the access and usage statistics of the mobile web interface of the proposed system such as hits and new visitors, mobile platforms, average time on site and rejection rate.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

PADICAT is the web archive created in 2005 in Catalonia (Spain ) by the Library of Catalonia (BC ) , the National Library of Catalonia , with the aim of collecting , processing and providing permanent access to the digital heritage of Catalonia . Its harvesting strategy is based on the hybrid model ( of massive harvesting . SPA top level domain ; selective compilation of the web site output of Catalan organizations; focused harvesting of public events) . The system provides open access to the whole collection , on the Internet . We consider necessary to complement the current search for new and visualization software with open source software tool, CAT ( Curator Archiving Tool) , composed by three modules aimed to effectively managing the processes of human cataloguing ; to publish directories where the digital resources and special collections ; and to offer statistical information of added value to end users. Within the framework of the International Internet Preservation Consortium meeting ( Vienna 2010) , the progress in the development of this new tool, and the philosophy that has motivated his design, are presented to the international community.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Server responsiveness and scalability are more important than ever in today’s client/server dominated network environments. Recently, researchers have begun to consider cluster-based computers using commodity hardware as an alternative to expensive specialized hardware for building scalable Web servers. In this paper, we present performance results comparing two cluster-based Web servers based on different server infrastructures: MAC-based dispatching (LSMAC) and IP-based dispatching (LSNAT). Both cluster-based server systems were implemented as application-space programs running on commodity hardware. We point out the advantages and disadvantages of both systems. We also identify when servers should be clustered and when clustering will not improve performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A manutenção do conteúdo web pode ser uma tarefa difícil, especialmente se considerarmos websites em que muitos utilizadores têm permissões para alterar o seu conteúdo. Um exemplo deste tipo de websites são os wikis. Se por um lado permitem rápida disseminação de conhecimento, por outro lado implicam um grande esforço para verificar a qualidade do seu conteúdo. Nesta tese analisamos diferentes abordagens à modelação de websites, especialmente para a verificação de conteúdo, onde contribuímos com uma extensão à ferramenta VeriFLog para a tornar mais adequada à verificação de conteúdos em websites colaborativos.