939 resultados para Scientific data
Resumo:
This thesis provides a query model suitable for context sensitive access to a wide range of distributed linked datasets which are available to scientists using the Internet. The model is designed based on scientific research standards which require scientists to provide replicable methods in their publications. Although there are query models available that provide limited replicability, they do not contextualise the process whereby different scientists select dataset locations based on their trust and physical location. In different contexts, scientists need to perform different data cleaning actions, independent of the overall query, and the model was designed to accommodate this function. The query model was implemented as a prototype web application and its features were verified through its use as the engine behind a major scientific data access site, Bio2RDF.org. The prototype showed that it was possible to have context sensitive behaviour for each of the three mirrors of Bio2RDF.org using a single set of configuration settings. The prototype provided executable query provenance that could be attached to scientific publications to fulfil replicability requirements. The model was designed to make it simple to independently interpret and execute the query provenance documents using context specific profiles, without modifying the original provenance documents. Experiments using the prototype as the data access tool in workflow management systems confirmed that the design of the model made it possible to replicate results in different contexts with minimal additions, and no deletions, to query provenance documents.
Resumo:
This study was undertaken by UKOLN on behalf of the Joint Information Systems Committee (JISC) in the period April to September 2008. Application profiles are metadata schemata which consist of data elements drawn from one or more namespaces, optimized for a particular local application. They offer a way for particular communities to base the interoperability specifications they create and use for their digital material on established open standards. This offers the potential for digital materials to be accessed, used and curated effectively both within and beyond the communities in which they were created. The JISC recognized the need to undertake a scoping study to investigate metadata application profile requirements for scientific data in relation to digital repositories, and specifically concerning descriptive metadata to support resource discovery and other functions such as preservation. This followed on from the development of the Scholarly Works Application Profile (SWAP) undertaken within the JISC Digital Repositories Programme and led by Andy Powell (Eduserv Foundation) and Julie Allinson (RRT UKOLN) on behalf of the JISC. Aims and Objectives 1.To assess whether a single metadata AP for research data, or a small number thereof, would improve resource discovery or discovery-to-delivery in any useful or significant way. 2.If so, then to:a.assess whether the development of such AP(s) is practical and if so, how much effort it would take; b.scope a community uptake strategy that is likely to be successful, identifying the main barriers and key stakeholders. 3.Otherwise, to investigate how best to improve cross-discipline, cross-community discovery-to-delivery for research data, and make recommendations to the JISC and others as appropriate. Approach The Study used a broad conception of what constitutes scientific data, namely data gathered, collated, structured and analysed using a recognizably scientific method, with a bias towards quantitative methods. The approach taken was to map out the landscape of existing data centres, repositories and associated projects, and conduct a survey of the discovery-to-delivery metadata they use or have defined, alongside any insights they have gained from working with this metadata. This was followed up by a series of unstructured interviews, discussing use cases for a Scientific Data Application Profile, and how widely a single profile might be applied. On the latter point, matters of granularity, the experimental/measurement contrast, the quantitative/qualitative contrast, the raw/derived data contrast, and the homogeneous/heterogeneous data collection contrast were discussed. The Study report was loosely structured according to the Singapore Framework for Dublin Core Application Profiles, and in turn considered: the possible use cases for a Scientific Data Application Profile; existing domain models that could either be used or adapted for use within such a profile; and a comparison existing metadata profiles and standards to identify candidate elements for inclusion in the description set profile for scientific data. The report also considered how the application profile might be implemented, its relationship to other application profiles, the alternatives to constructing a Scientific Data Application Profile, the development effort required, and what could be done to encourage uptake in the community. The conclusions of the Study were validated through a reference group of stakeholders.
Resumo:
Wednesday 23rd April 2014 Speaker(s): Willi Hasselbring Organiser: Leslie Carr Time: 23/04/2014 11:00-11:50 Location: B32/3077 File size: 669 Mb Abstract For good scientific practice, it is important that research results may be properly checked by reviewers and possibly repeated and extended by other researchers. This is of particular interest for "digital science" i.e. for in-silico experiments. In this talk, I'll discuss some issues of how software systems and services may contribute to good scientific practice. Particularly, I'll present our PubFlow approach to automate publication workflows for scientific data. The PubFlow workflow management system is based on established technology. We integrate institutional repository systems (based on EPrints) and world data centers (in marine science). PubFlow collects provenance data automatically via our monitoring framework Kieker. Provenance information describes the origins and the history of scientific data in its life cycle, and the process by which it arrived. Thus, provenance information is highly relevant to repeatability and trustworthiness of scientific results. In our evaluation in marine science, we collaborate with the GEOMAR Helmholtz Centre for Ocean Research Kiel.
Resumo:
The purpose of this study was to develop an understanding of the current state of scientific data sharing that stakeholders could use to develop and implement effective data sharing strategies and policies. The study developed a conceptual model to describe the process of data sharing, and the drivers, barriers, and enablers that determine stakeholder engagement. The conceptual model was used as a framework to structure discussions and interviews with key members of all stakeholder groups. Analysis of data obtained from interviewees identified a number of themes that highlight key requirements for the development of a mature data sharing culture.
Resumo:
As scientific workflows and the data they operate on, grow in size and complexity, the task of defining how those workflows should execute (which resources to use, where the resources must be in readiness for processing etc.) becomes proportionally more difficult. While "workflow compilers", such as Pegasus, reduce this burden, a further problem arises: since specifying details of execution is now automatic, a workflow's results are harder to interpret, as they are partly due to specifics of execution. By automating steps between the experiment design and its results, we lose the connection between them, hindering interpretation of results. To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, inputs and intermediary data, but also the abstract experiment, refined into a concrete execution by the "workflow compiler". In this paper, we describe preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.
Resumo:
This work was supported in part by the EU „2nd Generation Open Access Infrastructure for Research in Europe" (OpenAIRE+). The autumn training school Development and Promotion of Open Access to Scientific Information and Research is organized in the frame of the Fourth International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage—DiPP2014 (September 18–21, 2014, Veliko Tarnovo, Bulgaria, http://dipp2014.math.bas.bg/), organized under the UNESCO patronage. The main organiser is the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences with the support of EU project FOSTER (http://www.fosteropenscience.eu/) and the P. R. Slaveykov Regional Public Library in Veliko Tarnovo, Bulgaria.
Resumo:
Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.
Resumo:
This report compares the legal status of research data in the four KE partner countries. The report also addresses where European copyright and database law poses flaws and obstacles to the access to research data and singles out pre-conditions for openly available data. Background of the study Intellectual property right regulations regarding primary research data are a recurrent topic in the discussion on the improvement of access to research data. In fact in the final report of the High Level Expert Group on Scientific Data ‘Riding the Wave’ creating clarity on this was considered very important in improving awareness for all parties involved. According to the recommendations of the report legal issues should be “worked out so that they encourage, and not impede, global data sharing” http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf. While open access to research data is a widely recognised goal, achieving it remains a challenge. As European national laws still diverge and sometimes remain unclear it can be difficult for interested parties to fully comprehend in which ways open access to research data can be legally obtained. Based on these discussions the Knowledge Exchange working group on primary research data has commissioned a comparative report on the legal status of research data in the four KE partner countries. The study has been conducted by the Centre for Intellectual Property Law (CIER) at Utrecht University. The report aims at informing Knowledge Exchange and associated stakeholders on the state of the law concerning access to research data in the KE partner countries (Germany, Denmark, the Netherlands, and the United Kingdom) and to give an insight in how these laws work in practice. This is explained in several characteristic situations pertaining to open access to research data. The purpose of the report is to identify flaws and obstacles to the access to research data and to single out pre-conditions for openly available data. This is in view of the current discussions concerning open access to research data, especially those originating from publicly funded research. The report intends to be both a description of the status quo of the legislation and a practical instrument to prepare further activities in raising awareness on the potential benefit of improved access to research data, and developing means to support the improved access for research purposes
Resumo:
The personal computer has become commonplace on the desk of most scientists. As hardware costs have plummeted, software capabilities have expanded enormously, permitting the scientist to examine extremely large datasets in novel ways. Advances in networking now permit rapid transfer of large datasets, which can often be used unchanged from one machine to the next. In spite of these significant advances, many scientists still use their personal computers only for word processing or e-mail, or as "dumb terminals". Many are simply unaware of the richness of software now available to statistically analyze and display scientific data in highly innovative ways. This paper presents several examples drawn from actual climate data analysis that illustrate some novel and practical features of several widely-used software packages for Macintosh computers.
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
We describe ncWMS, an implementation of the Open Geospatial Consortium’s Web Map Service (WMS) specification for multidimensional gridded environmental data. ncWMS can read data in a large number of common scientific data formats – notably the NetCDF format with the Climate and Forecast conventions – then efficiently generate map imagery in thousands of different coordinate reference systems. It is designed to require minimal configuration from the system administrator and, when used in conjunction with a suitable client tool, provides end users with an interactive means for visualizing data without the need to download large files or interpret complex metadata. It is also used as a “bridging” tool providing interoperability between the environmental science community and users of geographic information systems. ncWMS implements a number of extensions to the WMS standard in order to fulfil some common scientific requirements, including the ability to generate plots representing timeseries and vertical sections. We discuss these extensions and their impact upon present and future interoperability. We discuss the conceptual mapping between the WMS data model and the data models used by gridded data formats, highlighting areas in which the mapping is incomplete or ambiguous. We discuss the architecture of the system and particular technical innovations of note, including the algorithms used for fast data reading and image generation. ncWMS has been widely adopted within the environmental data community and we discuss some of the ways in which the software is integrated within data infrastructures and portals.