999 resultados para web harvesting
Resumo:
This report describes web archiving in the National Library of Finland. The National Library of Finland has been archiving Finnish web on a regular basis since 2006. Web archiving is an important part of the Library'ʹs endeavours to collect and preserve Finnish published cultural heritage. In 2010, the amount of harvested data was 200 million files, or 25 Terabytes. The report takes the reader through the relevant legislation; internal plans and policies; funding and their allocation; the practices of web archiving; arrangements for the use of the archive; and issues rising from data security, sensitive materials, &c.
Resumo:
The National Library of Finland is responsible for the collection, preservation and accessibility of Finland’s published national heritage, and for its other unique collections. This presentation will give a general overview of the several processes employed in the digitization and handling of electronic legal deposit. METS format has been chosen as the container format for digitized materials, and considerable amount of effort has been put into creating adequate METS profiles. As METS will be heavily relied as a container format, the practicalities are discussed in some depth. Regarding electronic legal deposit, the National Library has concentrated on large-scale web harvesting. Depositing of e-books is being tested with publishers. The future plans concerning digital preservation will be presented, especially the National Digital Library initiative.
Resumo:
PADICAT is the web archive created in 2005 in Catalonia (Spain ) by the Library of Catalonia (BC ) , the National Library of Catalonia , with the aim of collecting , processing and providing permanent access to the digital heritage of Catalonia . Its harvesting strategy is based on the hybrid model ( of massive harvesting . SPA top level domain ; selective compilation of the web site output of Catalan organizations; focused harvesting of public events) . The system provides open access to the whole collection , on the Internet . We consider necessary to complement the current search for new and visualization software with open source software tool, CAT ( Curator Archiving Tool) , composed by three modules aimed to effectively managing the processes of human cataloguing ; to publish directories where the digital resources and special collections ; and to offer statistical information of added value to end users. Within the framework of the International Internet Preservation Consortium meeting ( Vienna 2010) , the progress in the development of this new tool, and the philosophy that has motivated his design, are presented to the international community.
Resumo:
The objective of this study is to develop a Pollution Early Warning System (PEWS) for efficient management of water quality in oyster harvesting areas. To that end, this paper presents a web-enabled, user-friendly PEWS for managing water quality in oyster harvesting areas along Louisiana Gulf Coast, USA. The PEWS consists of (1) an Integrated Space-Ground Sensing System (ISGSS) gathering data for environmental factors influencing water quality, (2) an Artificial Neural Network (ANN) model for predicting the level of fecal coliform bacteria, and (3) a web-enabled, user-friendly Geographic Information System (GIS) platform for issuing water pollution advisories and managing oyster harvesting waters. The ISGSS (data acquisition system) collects near real-time environmental data from various sources, including NASA MODIS Terra and Aqua satellites and in-situ sensing stations managed by the USGS and the NOAA. The ANN model is developed using the ANN program in MATLAB Toolbox. The ANN model involves a total of 6 independent environmental variables, including rainfall, tide, wind, salinity, temperature, and weather type along with 8 different combinations of the independent variables. The ANN model is constructed and tested using environmental and bacteriological data collected monthly from 2001 – 2011 by Louisiana Molluscan Shellfish Program at seven oyster harvesting areas in Louisiana Coast, USA. The ANN model is capable of explaining about 76% of variation in fecal coliform levels for model training data and 44% for independent data. The web-based GIS platform is developed using ArcView GIS and ArcIMS. The web-based GIS system can be employed for mapping fecal coliform levels, predicted by the ANN model, and potential risks of norovirus outbreaks in oyster harvesting waters. The PEWS is able to inform decision-makers of potential risks of fecal pollution and virus outbreak on a daily basis, greatly reducing the risk of contaminated oysters to human health.
Resumo:
Landowners and agencies have expressed difficulty finding hunters willing to harvest the female portion of the ungulate populations, and likewise, hunters have expressed difficulty achieving access to private lands. Since 2003, the Montana “DoeCowHunt” website (www.doecowhunt.montana.edu) has provided an avenue to improve hunter-landowner contact and wild ungulate population management. A product of Montana State University Extension Wildlife Program, this website provides a means for hunters and landowners in Montana to contact each other by listing contact information (email address, physical address, and telephone number) for the purpose of harvesting antlerless ungulates. In the first year over 10,000 users visited the site. Of those who actually registered, 11 were landowners and 1334 were hunters. An evaluation survey resulted in a 40% response rate. The survey indicated the average registered landowner had 20 hunter contacts. Many landowners contacted hunters through use of the website but did not register or list their contact information on the site.
Resumo:
The web is continuously evolving into a collection of many data, which results in the interest to collect and merge these data in a meaningful way. Based on that web data, this paper describes the building of an ontology resting on fuzzy clustering techniques. Through continual harvesting folksonomies by web agents, an entire automatic fuzzy grassroots ontology is built. This self-updating ontology can then be used for several practical applications in fields such as web structuring, web searching and web knowledge visualization.A potential application for online reputation analysis, added value and possible future studies are discussed in the conclusion.
Resumo:
Internet está evolucionando hacia la conocida como Live Web. En esta nueva etapa en la evolución de Internet, se pone al servicio de los usuarios multitud de streams de datos sociales. Gracias a estas fuentes de datos, los usuarios han pasado de navegar por páginas web estáticas a interacturar con aplicaciones que ofrecen contenido personalizado, basada en sus preferencias. Cada usuario interactúa a diario con multiples aplicaciones que ofrecen notificaciones y alertas, en este sentido cada usuario es una fuente de eventos, y a menudo los usuarios se sienten desbordados y no son capaces de procesar toda esa información a la carta. Para lidiar con esta sobresaturación, han aparecido múltiples herramientas que automatizan las tareas más habituales, desde gestores de bandeja de entrada, gestores de alertas en redes sociales, a complejos CRMs o smart-home hubs. La contrapartida es que aunque ofrecen una solución a problemas comunes, no pueden adaptarse a las necesidades de cada usuario ofreciendo una solucion personalizada. Los Servicios de Automatización de Tareas (TAS de sus siglas en inglés) entraron en escena a partir de 2012 para dar solución a esta liminación. Dada su semejanza, estos servicios también son considerados como un nuevo enfoque en la tecnología de mash-ups pero centra en el usuarios. Los usuarios de estas plataformas tienen la capacidad de interconectar servicios, sensores y otros aparatos con connexión a internet diseñando las automatizaciones que se ajustan a sus necesidades. La propuesta ha sido ámpliamante aceptada por los usuarios. Este hecho ha propiciado multitud de plataformas que ofrecen servicios TAS entren en escena. Al ser un nuevo campo de investigación, esta tesis presenta las principales características de los TAS, describe sus componentes, e identifica las dimensiones fundamentales que los defines y permiten su clasificación. En este trabajo se acuña el termino Servicio de Automatización de Tareas (TAS) dando una descripción formal para estos servicios y sus componentes (llamados canales), y proporciona una arquitectura de referencia. De igual forma, existe una falta de herramientas para describir servicios de automatización, y las reglas de automatización. A este respecto, esta tesis propone un modelo común que se concreta en la ontología EWE (Evented WEb Ontology). Este modelo permite com parar y equiparar canales y automatizaciones de distintos TASs, constituyendo un aporte considerable paraa la portabilidad de automatizaciones de usuarios entre plataformas. De igual manera, dado el carácter semántico del modelo, permite incluir en las automatizaciones elementos de fuentes externas sobre los que razonar, como es el caso de Linked Open Data. Utilizando este modelo, se ha generado un dataset de canales y automatizaciones, con los datos obtenidos de algunos de los TAS existentes en el mercado. Como último paso hacia el lograr un modelo común para describir TAS, se ha desarrollado un algoritmo para aprender ontologías de forma automática a partir de los datos del dataset. De esta forma, se favorece el descubrimiento de nuevos canales, y se reduce el coste de mantenimiento del modelo, el cual se actualiza de forma semi-automática. En conclusión, las principales contribuciones de esta tesis son: i) describir el estado del arte en automatización de tareas y acuñar el término Servicio de Automatización de Tareas, ii) desarrollar una ontología para el modelado de los componentes de TASs y automatizaciones, iii) poblar un dataset de datos de canales y automatizaciones, usado para desarrollar un algoritmo de aprendizaje automatico de ontologías, y iv) diseñar una arquitectura de agentes para la asistencia a usuarios en la creación de automatizaciones. ABSTRACT The new stage in the evolution of the Web (the Live Web or Evented Web) puts lots of social data-streams at the service of users, who no longer browse static web pages but interact with applications that present them contextual and relevant experiences. Given that each user is a potential source of events, a typical user often gets overwhelmed. To deal with that huge amount of data, multiple automation tools have emerged, covering from simple social media managers or notification aggregators to complex CRMs or smart-home Hub/Apps. As a downside, they cannot tailor to the needs of every single user. As a natural response to this downside, Task Automation Services broke in the Internet. They may be seen as a new model of mash-up technology for combining social streams, services and connected devices from an end-user perspective: end-users are empowered to connect those stream however they want, designing the automations they need. The numbers of those platforms that appeared early on shot up, and as a consequence the amount of platforms following this approach is growing fast. Being a novel field, this thesis aims to shed light on it, presenting and exemplifying the main characteristics of Task Automation Services, describing their components, and identifying several dimensions to classify them. This thesis coins the term Task Automation Services (TAS) by providing a formal definition of them, their components (called channels), as well a TAS reference architecture. There is also a lack of tools for describing automation services and automations rules. In this regard, this thesis proposes a theoretical common model of TAS and formalizes it as the EWE ontology This model enables to compare channels and automations from different TASs, which has a high impact in interoperability; and enhances automations providing a mechanism to reason over external sources such as Linked Open Data. Based on this model, a dataset of components of TAS was built, harvesting data from the web sites of actual TASs. Going a step further towards this common model, an algorithm for categorizing them was designed, enabling their discovery across different TAS. Thus, the main contributions of the thesis are: i) surveying the state of the art on task automation and coining the term Task Automation Service; ii) providing a semantic common model for describing TAS components and automations; iii) populating a categorized dataset of TAS components, used to learn ontologies of particular domains from the TAS perspective; and iv) designing an agent architecture for assisting users in setting up automations, that is aware of their context and acts in consequence.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
A utilização da web para a disponibilização de informações e serviços de órgãos governamentais para os cidadãos tem se tornado cada vez mais expressiva. Assim, a garantia de que esses conteúdos e serviços possam ser acessíveis a qualquer cidadão é imprescindível, independentemente de necessidades especiais ou de quaisquer outras barreiras. No Brasil, o Decreto-Lei nº5.296/2004 determinou que todos os órgãos governamentais deveriam adaptar seus sítios na web de acordo com critérios de acessibilidade até dezembro de 2005. Com o objetivo de verificar a evolução da acessibilidade ao longo dos anos e como foi o impacto dessa legislação, este artigo analisa a acessibilidade dos sítios dos governos estaduais brasileiros por meio de amostras coletadas entre 1996 e 2007. Foram efetuadas análises por meio de métricas, obtidas por avaliações com ferramentas automáticas. Os resultados indicam que a legislação teve pouco impacto para a melhoria real da acessibilidade dos sítios no período indicado, com uma melhora somente em 2007. Verifica-se que se faz necessário adotar políticas públicas mais efetivas para que as pessoas com necessidades especiais tenham os seus direitos para acesso a informações e aos serviços públicos na web assegurados mais amplamente.
Resumo:
This paper investigates the concept of piezoaeroelasticity for energy harvesting. The focus is placed on mathematical modeling and experimental validations of the problem of generating electricity at the flutter boundary of a piezoaeroelastic airfoil. An electrical power output of 10.7 mW is delivered to a 100 k load at the linear flutter speed of 9.30 m/s (which is 5.1% larger than the short-circuit flutter speed). The effect of piezoelectric power generation on the linear flutter speed is also discussed and a useful consequence of having nonlinearities in the system is addressed. (C) 2010 American Institute of Physics. [doi:10.1063/1.3427405]
Resumo:
With the advent and development of technology, mainly in the Internet, more and more electronic services are being offered to customers in all areas of business, especially in the offering of information services, as in virtual libraries. This article proposes a new opportunity to provide services to virtual libraries customers, presenting a methodology for the implementation of electronic services oriented by these customers' life situations. Through analytical observations of some national virtual libraries sites, it could be identified that the offer of services considering life situations and relationship interest situations can promote the service to their customers, providing greater satisfaction and, consequently, improving quality in the offer of information services. The visits to those sites and the critical analysis of the data collected during these visits, supported by bibliographic researches results, have enabled the description of this methodology, concluding that the provision of services on an isolated way or in accordance with the user's profile on sites of virtual libraries is not always enough to ensure the attendance to the needs and expectations of its customers, which suggests the offering of these services considering life situations and relationship interest situations as a complement that adds value to the business of virtual library. This becomes relevant when indicates new opportunities to provide virtual libraries services with quality, serving as a guide to the information providers managers, enabling the offering of new means to access information services by such customers, looking for pro - activity and services integration, in order to solve definitely real problems.
Resumo:
Background: A relative friability to capture a sufficiently large patient population in any one geographic location has traditionally limited research into rare diseases. Methods and Results: Clinicians interested in the rare disease lymphangioleiomyomatosis (LAM) have worked with the LAM Treatment Alliance, the MIT Media Lab, and Clozure Associates to cooperate in the design of a state-of-the-art data coordination platform that can be used for clinical trials and other research focused on the global LAM patient population. This platform is a component of a set of web-based resources, including a patient self-report data portal, aimed at accelerating research in rare diseases in a rigorous fashion. Conclusions: Collaboration between clinicians, researchers, advocacy groups, and patients can create essential community resource infrastructure to accelerate rare disease research. The International LAM Registry is an example of such an effort.
Resumo:
The dynamical discrete web (DyDW), introduced in the recent work of Howitt and Warren, is a system of coalescing simple symmetric one-dimensional random walks which evolve in an extra continuous dynamical time parameter tau. The evolution is by independent updating of the underlying Bernoulli variables indexed by discrete space-time that define the discrete web at any fixed tau. In this paper, we study the existence of exceptional (random) values of tau where the paths of the web do not behave like usual random walks and the Hausdorff dimension of the set of such exceptional tau. Our results are motivated by those about exceptional times for dynamical percolation in high dimension by Haggstrom, Peres and Steif, and in dimension two by Schramm and Steif. The exceptional behavior of the walks in the DyDW is rather different from the situation for the dynamical random walks of Benjamini, Haggstrom, Peres and Steif. For example, we prove that the walk from the origin S(0)(tau) violates the law of the iterated logarithm (LIL) on a set of tau of Hausdorff dimension one. We also discuss how these and other results should extend to the dynamical Brownian web, the natural scaling limit of the DyDW. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Agricultural management practices that promote net carbon (C) accumulation in the soil have been considered as an important potential mitigation option to combat global warming. The change in the sugarcane harvesting system, to one which incorporates C into the soil from crop residues, is the focus of this work. The main objective was to assess and discuss the changes in soil organic C stocks caused by the conversion of burnt to unburnt sugarcane harvesting systems in Brazil, when considering the main soils and climates associated with this crop. For this purpose, a dataset was obtained from a literature review of soils under sugarcane in Brazil. Although not necessarily from experimental studies, only paired comparisons were examined, and for each site the dominant soil type, topography and climate were similar. The results show a mean annual C accumulation rate of 1.5 Mg ha-1 year-1 for the surface to 30-cm depth (0.73 and 2.04 Mg ha-1 year-1 for sandy and clay soils, respectively) caused by the conversion from a burnt to an unburnt sugarcane harvesting system. The findings suggest that soil should be included in future studies related to life cycle assessment and C footprint of Brazilian sugarcane ethanol.
Resumo:
Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.