888 resultados para P2P and networked data management
Resumo:
Görzig, H., Engel, F., Brocks, H., Vogel, T. & Hemmje, M. (2015, August). Towards Data Management Planning Support for Research Data. Paper presented at the ASE International Conference on Data Science, Stanford, United States of America.
Resumo:
This document is the first out of three iterations of the DMP that will be formally delivered during the project. Version 2 is due in month 24 and version 3 towards the end of the project. The DMP thus is not a fixed document; it evolves and gains more precision and substance during the lifespan of the project. In this first version we describe the planned research data sets related to the RAGE evaluation and validation activities, and the fifteen principles that will guide data management in RAGE. The former are described in the format of the EU data management template, and the latter in terms of their guiding principle, how we propose to implement them, and when they will be implemented. This document is thus first of all relevant to WP5 and WP8 members.
Resumo:
As the UK's national marine data centre, a key responsibility of the British Oceanographic Data Centre (BODC) is to provide data management support for the scientific activities of complex multi-disciplinary long-term research programmes. Since the initial cruise in 1995, the NERC funded Atlantic Meridional Transect (AMT) project has undertaken 18 north–south transects of the Atlantic Ocean. As the project has evolved there has been a steady growth in the number of participants, the volume of data, its complexity and the demand for data. BODC became involved in AMT in 2002 at the beginning of phase II of this programme and since then has provided continuous support to the AMT and the wider scientific community through the rescue, quality control, processing and access to the data. The data management is carried out by a team of specialists using a sophisticated infrastructure and hardware to manage, integrate and serve physical, biological and chemical data. Here, we discuss the approach adopted, techniques applied and some guiding principles for management of large multi-disciplinary programmes.
Resumo:
To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.
Resumo:
Background: Long working hours might increase the risk of cardiovascular disease, but prospective evidence is scarce, imprecise, and mostly limited to coronary heart disease. We aimed to assess long working hours as a risk factor for incident coronary heart disease and stroke.
Methods We identified published studies through a systematic review of PubMed and Embase from inception to Aug 20, 2014. We obtained unpublished data for 20 cohort studies from the Individual-Participant-Data Meta-analysis in Working Populations (IPD-Work) Consortium and open-access data archives. We used cumulative random-effects meta-analysis to combine effect estimates from published and unpublished data.
Findings We included 25 studies from 24 cohorts in Europe, the USA, and Australia. The meta-analysis of coronary heart disease comprised data for 603 838 men and women who were free from coronary heart disease at baseline; the meta-analysis of stroke comprised data for 528 908 men and women who were free from stroke at baseline. Follow-up for coronary heart disease was 5·1 million person-years (mean 8·5 years), in which 4768 events were recorded, and for stroke was 3·8 million person-years (mean 7·2 years), in which 1722 events were recorded. In cumulative meta-analysis adjusted for age, sex, and socioeconomic status, compared with standard hours (35-40 h per week), working long hours (≥55 h per week) was associated with an increase in risk of incident coronary heart disease (relative risk [RR] 1·13, 95% CI 1·02-1·26; p=0·02) and incident stroke (1·33, 1·11-1·61; p=0·002). The excess risk of stroke remained unchanged in analyses that addressed reverse causation, multivariable adjustments for other risk factors, and different methods of stroke ascertainment (range of RR estimates 1·30-1·42). We recorded a dose-response association for stroke, with RR estimates of 1·10 (95% CI 0·94-1·28; p=0·24) for 41-48 working hours, 1·27 (1·03-1·56; p=0·03) for 49-54 working hours, and 1·33 (1·11-1·61; p=0·002) for 55 working hours or more per week compared with standard working hours (ptrend<0·0001).
Interpretation Employees who work long hours have a higher risk of stroke than those working standard hours; the association with coronary heart disease is weaker. These findings suggest that more attention should be paid to the management of vascular risk factors in individuals who work long hours.
Resumo:
Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.
Resumo:
Although overfishing is a concern for many fish stocks, it was for a long time only associated with commercial fishing exploitation, with less or no attention being given to the recreational fisheries. Recent research has shown however that the impact of recreational fishing on particular species can be considerable, and that the recreational harvest needs to be taken into account if fisheries are to be accurately assessed and effectively managed. In Portugal, the first recreational fishing regulations were only recently implemented. However, mirroring other European countries, regulations lacked scientific support, and specific knowledge of the activity was limited to a few studies with limited coverage. This thesis aimed to characterize the biological and socioeconomic aspects of the recreational shore angling activity in southern Portugal, to investigate whether the regulations in place were adequate and effective, and to provide recommendations for improved management and conservation of the inshore fisheries resources. A combined aerial-roving survey was conducted to gather data on fishing effort, catch, fishing trips and socioeconomic aspects (including anglers’ perceptions of regulations) of the recreational angling activity. The analysis of anglers’ catches suggested that compliance with daily bag limits was high, with less than 0.5% of creels exceeding the 10 kg angler-1 day-1 bag limit. Overall, 11.5% of the retained fishes were undersized, but non-compliance with minimum size limits was found to be high for some species (e.g. seabass, 73% undersized). In terms of the impact of recreational shore angling, the total estimated catches corresponded to less than 1% of the commercial landings for the same period (shared species). However, shore angling catches for white sea bream (Diplodus sargus) were found to be considerable, corresponding to 65% of the commercial landings (39.4% of total catch). In terms of anglers’ perceptions about the recreational fishing regulations in Portugal, the present study has shown that the majority of anglers accepted the existence of some kind of SRF regulations, but in general there was a partial or total disagreement with the recreational fishing restrictions recently put in place. Most anglers perceived themselves as not being involved in the decision-making process and claimed that some restrictions lacked a meaningful rationale (e.g. prohibition of fishing from piers/jetties). Fishers’ awareness with regard to specific aspects of the restrictions (such as the rationale for minimum size limits) was found to be very limited. During the same period, catches from sport fishing competitions were examined to test for differences with the recreational activity in terms of catches, and evaluate long term trends in catch and mean size of fish. Catches of the sport fishing competitions were found to be different from those observed for recreational fishing, being dominated by different species (e.g. garfish, mullets), and suggesting different fishing strategies of the the two types of anglers. High percentages of undersized fish were observed to be captured (and retained) during the competitions (in particular seabass, with 100% undersized), probably as a result of a single allowable minimum size (AMS) of 15 cm for all species in use in competitions. Lastly, catch and release fishing experiments were carried out to assess post-release mortality of three recreationally important species: two banded sea bream Diplodus vulgaris; black sea bream Spondyliosoma cantharus; and gilthead sea bream Sparus aurata. Post-release mortalities were found to be low (0-12%). The main predictor of mortality for Sparus aurata was anatomical hooking location, with 63% of the fishes that died being deeply hooked. The results support the release of fish, either from mandatory (e.g. minimum landing sizes) or voluntary practices. In summary, this thesis has demonstrated that the impact of recreational fishing for particular species is significant and needs to be taken into account for more effective management and stock assessment purposes. It has also highlighted several management issues that should be addressed in order to promote more adequate regulations in the future and prevent noncompliance issues. A periodic monitoring of the recreational fishing activity including all fishing modes (i.e. spear fishing, boat, and shore angling) would also be beneficial to ensure a timely knowledge on the global recreational fishing activity and support future management actions.
Resumo:
This study aims to optimize the water quality monitoring of a polluted watercourse (Leça River, Portugal) through the principal component analysis (PCA) and cluster analysis (CA). These statistical methodologies were applied to physicochemical, bacteriological and ecotoxicological data (with the marine bacterium Vibrio fischeri and the green alga Chlorella vulgaris) obtained with the analysis of water samples monthly collected at seven monitoring sites and during five campaigns (February, May, June, August, and September 2006). The results of some variables were assigned to water quality classes according to national guidelines. Chemical and bacteriological quality data led to classify Leça River water quality as “bad” or “very bad”. PCA and CA identified monitoring sites with similar pollution pattern, giving to site 1 (located in the upstream stretch of the river) a distinct feature from all other sampling sites downstream. Ecotoxicity results corroborated this classification thus revealing differences in space and time. The present study includes not only physical, chemical and bacteriological but also ecotoxicological parameters, which broadens new perspectives in river water characterization. Moreover, the application of PCA and CA is very useful to optimize water quality monitoring networks, defining the minimum number of sites and their location. Thus, these tools can support appropriate management decisions.
Resumo:
Grasslands in semi-arid regions, like Mongolian steppes, are facing desertification and degradation processes, due to climate change. Mongolia’s main economic activity consists on an extensive livestock production and, therefore, it is a concerning matter for the decision makers. Remote sensing and Geographic Information Systems provide the tools for advanced ecosystem management and have been widely used for monitoring and management of pasture resources. This study investigates which is the higher thematic detail that is possible to achieve through remote sensing, to map the steppe vegetation, using medium resolution earth observation imagery in three districts (soums) of Mongolia: Dzag, Buutsagaan and Khureemaral. After considering different thematic levels of detail for classifying the steppe vegetation, the existent pasture types within the steppe were chosen to be mapped. In order to investigate which combination of data sets yields the best results and which classification algorithm is more suitable for incorporating these data sets, a comparison between different classification methods were tested for the study area. Sixteen classifications were performed using different combinations of estimators, Landsat-8 (spectral bands and Landsat-8 NDVI-derived) and geophysical data (elevation, mean annual precipitation and mean annual temperature) using two classification algorithms, maximum likelihood and decision tree. Results showed that the best performing model was the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), using the decision tree. For maximum likelihood, the model that incorporated Landsat-8 bands with mean annual precipitation (Model 5) and the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), achieved the higher accuracies for this algorithm. The decision tree models consistently outperformed the maximum likelihood ones.
Resumo:
The telemetry data processing operation intended for a given mission are pre-defined by an onboard telemetry configuration, mission trajectory and overall telemetry methodology have stabilized lately for ISRO vehicles. The given problem on telemetry data processing is reduced through hierarchical problem reduction whereby the sequencing of operations evolves as the control task and operations on data as the function task. The function task Input, Output and execution criteria are captured into tables which are examined by the control task and then schedules when the function task when the criteria is being met.
Big Decisions and Sparse Data: Adapting Scientific Publishing to the Needs of Practical Conservation
Resumo:
The biggest challenge in conservation biology is breaking down the gap between research and practical management. A major obstacle is the fact that many researchers are unwilling to tackle projects likely to produce sparse or messy data because the results would be difficult to publish in refereed journals. The obvious solution to sparse data is to build up results from multiple studies. Consequently, we suggest that there needs to be greater emphasis in conservation biology on publishing papers that can be built on by subsequent research rather than on papers that produce clear results individually. This building approach requires: (1) a stronger theoretical framework, in which researchers attempt to anticipate models that will be relevant in future studies and incorporate expected differences among studies into those models; (2) use of modern methods for model selection and multi-model inference, and publication of parameter estimates under a range of plausible models; (3) explicit incorporation of prior information into each case study; and (4) planning management treatments in an adaptive framework that considers treatments applied in other studies. We encourage journals to publish papers that promote this building approach rather than expecting papers to conform to traditional standards of rigor as stand-alone papers, and believe that this shift in publishing philosophy would better encourage researchers to tackle the most urgent conservation problems.
Resumo:
The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its’ metadata catalogue to provide a virtual file-system that can handle files of any size and type. However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or “bundled”) into larger units. We have developed a system that can bundle small data files of any type into larger units - mounted collections. The system can create collection families and contains its’ own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files. In this paper we describe the motivation for creating a mounted collection system, its’ architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.
Resumo:
Climate-G is a large scale distributed testbed devoted to climate change research. It is an unfunded effort started in 2008 and involving a wide community both in Europe and US. The testbed is an interdisciplinary effort involving partners from several institutions and joining expertise in the field of climate change and computational science. Its main goal is to allow scientists carrying out geographical and cross-institutional data discovery, access, analysis, visualization and sharing of climate data. It represents an attempt to address, in a real environment, challenging data and metadata management issues. This paper presents a complete overview about the Climate-G testbed highlighting the most important results that have been achieved since the beginning of this project.