964 resultados para non profit, linked open data, web scraping, web crawling
Resumo:
Open educational resources (OER) promise increased access, participation, quality, and relevance, in addition to cost reduction. These seemingly fantastic promises are based on the supposition that educators and learners will discover existing resources, improve them, and share the results, resulting in a virtuous cycle of improvement and re-use. By anecdotal metrics, existing web scale search is not working for OER. This situation impairs the cycle underlying the promise of OER, endangering long term growth and sustainability. While the scope of the problem is vast, targeted improvements in areas of curation, indexing, and data exchange can improve the situation, and create opportunities for further scale. I explore the way the system is currently inadequate, discuss areas for targeted improvement, and describe a prototype system built to test these ideas. I conclude with suggestions for further exploration and development.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
Resumo:
Current methods and tools that support Linked Data publication have mainly focused so far on static data, without considering the growing amount of streaming data available on the Web. In this paper we describe a case study that involves the publication of static and streaming Linked Data for bike sharing systems and related entities. We describe some of the challenges that we have faced, the solutions that we have explored, the lessons that we have learned, and the opportunities that lie in the future for exploiting Linked Stream Data.
Resumo:
INTAMAP is a Web Processing Service for the automatic spatial interpolation of measured point data. Requirements were (i) using open standards for spatial data such as developed in the context of the Open Geospatial Consortium (OGC), (ii) using a suitable environment for statistical modelling and computation, and (iii) producing an integrated, open source solution. The system couples an open-source Web Processing Service (developed by 52°North), accepting data in the form of standardised XML documents (conforming to the OGC Observations and Measurements standard) with a computing back-end realised in the R statistical environment. The probability distribution of interpolation errors is encoded with UncertML, a markup language designed to encode uncertain data. Automatic interpolation needs to be useful for a wide range of applications and the algorithms have been designed to cope with anisotropy, extreme values, and data with known error distributions. Besides a fully automatic mode, the system can be used with different levels of user control over the interpolation process.
Resumo:
A collaboration between dot.rural at the University of Aberdeen and the iSchool at Northumbria University, POWkist is a pilot-study exploring potential usages of currently available linked datasets within the cultural heritage domain. Many privately-held family history collections (shoebox archives) remain vulnerable unless a sustainable, affordable and accessible model of citizen-archivist digital preservation can be offered. Citizen-historians have used the web as a platform to preserve cultural heritage, however with no accessible or sustainable model these digital footprints have been ad hoc and rarely connected to broader historical research. Similarly, current approaches to connecting material on the web by exploiting linked datasets do not take into account the data characteristics of the cultural heritage domain. Funded by Semantic Media, the POWKist project is investigating how best to capture, curate, connect and present the contents of citizen-historians’ shoebox archives in an accessible and sustainable online collection. Using the Curios platform - an open-source digital archive - we have digitised a collection relating to a prisoner of war during WWII (1939-1945). Following a series of user group workshops, POWkist is now connecting these ‘made digital’ items with the broader web using a semantic technology model and identifying appropriate linked datasets of relevant content such as DBPedia (an archived linked dataset of Wikipedia) and Ordnance Survey Open Data. We are analysing the characteristics of cultural heritage linked datasets, so that these materials are better visualised, contextualised and presented in an attractive and comprehensive user interface. Our paper will consider the issues we have identified, the solutions we are developing and include a demonstration of our work-in-progress.
Resumo:
Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
A manutenção do conteúdo web pode ser uma tarefa difícil, especialmente se considerarmos websites em que muitos utilizadores têm permissões para alterar o seu conteúdo. Um exemplo deste tipo de websites são os wikis. Se por um lado permitem rápida disseminação de conhecimento, por outro lado implicam um grande esforço para verificar a qualidade do seu conteúdo. Nesta tese analisamos diferentes abordagens à modelação de websites, especialmente para a verificação de conteúdo, onde contribuímos com uma extensão à ferramenta VeriFLog para a tornar mais adequada à verificação de conteúdos em websites colaborativos.
Resumo:
A new 'Consent Commons' licensing framework is proposed, complementing Creative Commons, to clarify the permissions given for using and reusing clinical and non-clinical digital recordings of people (patients and non-patients) for educational purposes. Consent Commons is a sophisticated expression of ethically based 'digital professionalism', which recognises the rights of patients, carers, their families, teachers, clinicians, students and members of the public to have some say in how their digital recordings are used (including refusing or withdrawing their consent), and is necessary in order to ensure the long term sustainability of teaching materials, including Open Educational Resources (OER). Consent Commons can ameliorate uncertainty about the status of educational resources depicting people, and protect institutions from legal risk by developing robust and sophisticated policies and promoting best practice in managing their information.
Resumo:
This study is a comparison AU Press with three other traditional (non-open access) Canadian university presses. The analysis is based on actual physical book sales on Amazon.com and Amazon.ca. Statistical methods include the sampling of the sales ranking of randomly selected books from each press. Results suggest that there is no significant difference in the ranking of printed books sold by AU Press in comparison with traditional university presses. However, AU Press, can demonstrate a significantly larger readership for its books as evidenced by thousands of downloads of the open electronic versions.
Resumo:
Much of the initial work on Open Educational Resources (OER) has inevitably concentrated on how to produce the resources themselves and to establish the idea in the community. It is now eight years since the term OER was first used and more than ten years since the concept of open content was described and a greater focus is now emerging on the way in which OER can influence policy and change the way in which educational systems help people learn. The Open University UK and Carnegie Mellon University are working in partnership on the OLnet (Open Learning Network), funded by The William and Flora Hewlett Foundation with the aims to search out the evidence for use and reuse of OER and to establish a network for information sharing about research in the field. This means both gathering evidence and developing approaches for how to research and understand ways to learn in a more open world, particularly linked to OER, but also looking at other influences.
Resumo:
Over the past year, the Open University of Catalonia library has been designing its new website with this question in mind. Our main concern has been how to integrate the library in the student day to day study routine to not to be only a satellite tool. We present the design of the website that, in a virtual library like ours, it is not only a website but the whole library itself. The central point of the web is my library, a space that associates the library resources with the student's curriculum and their course subjects. There the students can save the resources as favourites, comment or share them. They have also access to all the services the library offers them. The resources are imported from multiple tools such as Millennium, SFX, Metalib and Dspace to the Drupal CMS. Then the resources' metadata can be enriched with other contextual information from other sources, for example the course subjects. And finally they can be exported in standard, open data formats making them available for linked data applications.
Resumo:
Development of ectodermal appendages, such as hair, teeth, sweat glands, sebaceous glands, and mammary glands, requires the action of the TNF family ligand ectodysplasin A (EDA). Mutations of the X-linked EDA gene cause reduction or absence of many ectodermal appendages and have been identified as a cause of ectodermal dysplasia in humans, mice, dogs, and cattle. We have generated blocking antibodies, raised in Eda-deficient mice, against the conserved, receptor-binding domain of EDA. These antibodies recognize epitopes overlapping the receptor-binding site and prevent EDA from binding and activating EDAR at close to stoichiometric ratios in in vitro binding and activity assays. The antibodies block EDA1 and EDA2 of both mammalian and avian origin and, in vivo, suppress the ability of recombinant Fc-EDA1 to rescue ectodermal dysplasia in Eda-deficient Tabby mice. Moreover, administration of EDA blocking antibodies to pregnant wild type mice induced in developing wild type fetuses a marked and permanent ectodermal dysplasia. These function-blocking anti-EDA antibodies with wide cross-species reactivity will enable study of the developmental and postdevelopmental roles of EDA in a variety of organisms and open the route to therapeutic intervention in conditions in which EDA may be implicated.