3 resultados para Large Data
em JISC Information Environment Repository
Resumo:
Presentation slides as part of the Janet network end-to-end performance initiative
Resumo:
For sign languages used by deaf communities, linguistic corpora have until recently been unavailable, due to the lack of a writing system and a written culture in these communities, and the very recent advent of digital video. Recent improvements in video and computer technology have now made larger sign language datasets possible; however, large sign language datasets that are fully machine-readable are still elusive. This is due to two challenges. 1. Inconsistencies that arise when signs are annotated by means of spoken/written language. 2. The fact that many parts of signed interaction are not necessarily fully composed of lexical signs (equivalent of words), instead consisting of constructions that are less conventionalised. As sign language corpus building progresses, the potential for some standards in annotation is beginning to emerge. But before this project, there were no attempts to standardise these practices across corpora, which is required to be able to compare data crosslinguistically. This project thus had the following aims: 1. To develop annotation standards for glosses (lexical/word level) 2. To test their reliability and validity 3. To improve current software tools that facilitate a reliable workflow Overall the project aimed not only to set a standard for the whole field of sign language studies throughout the world but also to make significant advances toward two of the world’s largest machine-readable datasets for sign languages – specifically the BSL Corpus (British Sign Language, http://bslcorpusproject.org) and the Corpus NGT (Sign Language of the Netherlands, http://www.ru.nl/corpusngt).
Resumo:
The possibilities of digital research have altered the production, publication and use of research results. Academic research practice and culture are changing or have already been transformed, but to a large degree the system of academic recognition has not yet adapted to the practices and possibilities of digital research. This applies especially to research data, which are increasingly produced, managed, published and archived, but play hardly a role yet in practices of research assessment. The aim of the workshop was to bring experts and stakeholders from research institutions, universities, scholarly societies and funding agencies together in order to review, discuss and build on possibilities to implement the culture of sharing and to integrate publication of data into research assessment procedures. The report 'The Value of Research Data - Metrics for datasets from a cultural and technical point of view' was presented and discussed. Some of the key finding were that data sharing should be considered normal research practice, in fact not sharing should be considered malpractice. Research funders and universities should support and encourage data sharing. There are a number of important aspects to consider when making data count in research and evaluation procedures. Metrics are a necessary tool in monitoring the sharing of data sets. However, data metrics are at present not very well developed and there is not yet enough experience in what these metrics actually mean. It is important to implement the culture of sharing through codes of conducts in the scientific communities. For further key findings please read the report.