985 resultados para Open Science Data Cloud
Resumo:
Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.
Resumo:
Much advancement has been made in recent years in field data assimilation, remote sensing and ecosystem modeling, yet our global view of phytoplankton biogeography beyond chlorophyll biomass is still a cursory taxonomic picture with vast areas of the open ocean requiring field validations. High performance liquid chromatography (HPLC) pigment data combined with inverse methods offer an advantage over many other phytoplankton quantification measures by way of providing an immediate perspective of the whole phytoplankton community in a sample as a function of chlorophyll biomass. Historically, such chemotaxonomic analysis has been conducted mainly at local spatial and temporal scales in the ocean. Here, we apply a widely tested inverse approach, CHEMTAX, to a global climatology of pigment observations from HPLC. This study marks the first systematic and objective global application of CHEMTAX, yielding a seasonal climatology comprised of ~1500 1°x1° global grid points of the major phytoplankton pigment types in the ocean characterizing cyanobacteria, haptophytes, chlorophytes, cryptophytes, dinoflagellates, and diatoms, with results validated against prior regional studies where possible. Key findings from this new global view of specific phytoplankton abundances from pigments are a) the large global proportion of marine haptophytes (comprising 32 ± 5% of total chlorophyll), whose biogeochemical functional roles are relatively unknown, and b) the contrasting spatial scales of complexity in global community structure that can be explained in part by regional oceanographic conditions. These publicly accessible results will guide future parameterizations of marine ecosystem models exploring the link between phytoplankton community structure and marine biogeochemical cycles.
Resumo:
Pteropods are a group of holoplanktonic gastropods for which global biomass distribution patterns remain poorly resolved. The aim of this study was to collect and synthesize existing pteropod (Gymnosomata, Thecosomata and Pseudothecosomata) abundance and biomass data, in order to evaluate the global distribution of pteropod carbon biomass, with a particular emphasis on its seasonal, temporal and vertical patterns. We collected 25 902 data points from several online databases and a number of scientific articles. The biomass data has been gridded onto a 360 x 180° grid, with a vertical resolution of 33 WOA depth levels. Data has been converted to NetCDF format. Data were collected between 1951-2010, with sampling depths ranging from 0-1000 m. Pteropod biomass data was either extracted directly or derived through converting abundance to biomass with pteropod specific length to weight conversions. In the Northern Hemisphere (NH) the data were distributed evenly throughout the year, whereas sampling in the Southern Hemisphere was biased towards the austral summer months. 86% of all biomass values were located in the NH, most (42%) within the latitudinal band of 30-50° N. The range of global biomass values spanned over three orders of magnitude, with a mean and median biomass concentration of 8.2 mg C l-1 (SD = 61.4) and 0.25 mg C l-1, respectively for all data points, and with a mean of 9.1 mg C l-1 (SD = 64.8) and a median of 0.25 mg C l-1 for non-zero biomass values. The highest mean and median biomass concentrations were located in the NH between 40-50° S (mean biomass: 68.8 mg C l-1 (SD = 213.4) median biomass: 2.5 mg C l-1) while, in the SH, they were within the 70-80° S latitudinal band (mean: 10.5 mg C l-1 (SD = 38.8) and median: 0.2 mg C l-1). Biomass values were lowest in the equatorial regions. A broad range of biomass concentrations was observed at all depths, with the biomass peak located in the surface layer (0-25 m) and values generally decreasing with depth. However, biomass peaks were located at different depths in different ocean basins: 0-25 m depth in the N Atlantic, 50-100 m in the Pacific, 100-200 m in the Arctic, 200-500 m in the Brazilian region and >500 m in the Indo-Pacific region. Biomass in the NH was relatively invariant over the seasonal cycle, but more seasonally variable in the SH. The collected database provides a valuable tool for modellers for the study of ecosystem processes and global biogeochemical cycles.
Resumo:
CFC-11 (CCl3F), CFC-12 (CCl2F2), HF, and SF6 products from limb-viewing satellite instruments are provided in the form of monthly zonal mean time series obtained from HALOE, MIPAS, ACE-FTS, and HIRDLS within the time period 1991-2010. The data products are made available as part of the Stratosphere-troposphere Processes And their Role in Climate (SPARC) Data Initiative. The trace gas time series extend from the mid-troposphere to as high as the mesosphere. The zonal monthly mean time series are calculated on the SPARC Data Initiative climatology grid using 5° latitude bins and 28 pressure levels. The zonal monthly mean volume mixing ratio (VMR) and the standard deviation along with the number of averaged data values are given for each month, latitude bin, and pressure level. Furthermore, the mean, minimum, and maximum local solar time, the average latitude, and the average day of the month within each bin for one selected pressure level are provided. The time series of all variables are saved in a consistent netcdf format.
Resumo:
The smallest marine phytoplankton, collectively termed picophytoplankton, have been routinely enumerated by flow cytometry since the late 1980s, during cruises throughout most of the world ocean. We compiled a database of 40,946 data points, with separate abundance entries for Prochlorococcus, Synechococcus and picoeukaryotes. We use average conversion factors for each of the three groups to convert the abundance data to carbon biomass. After gridding with 1° spacing, the database covers 2.4% of the ocean surface area, with the best data coverage in the North Atlantic, the South Pacific and North Indian basins. The average picophytoplankton biomass is 12 ± 22 µg C L-1 or 1.9 g C m-2. We estimate a total global picophytoplankton biomass, excluding N2-fixers, of 0.53 - 0.74 Pg C (17 - 39 % Prochlorococcus, 12 - 15 % Synechococcus and 49 - 69 % picoeukaryotes). Future efforts in this area of research should focus on reporting calibrated cell size, and collecting data in undersampled regions.
Resumo:
A comprehensive hydroclimatic data set is presented for the 2011 water year to improve understanding of hydrologic processes in the rain-snow transition zone. This type of dataset is extremely rare in scientific literature because of the quality and quantity of soil depth, soil texture, soil moisture, and soil temperature data. Standard meteorological and snow cover data for the entire 2011 water year are included, which include several rain-on-snow events. Surface soil textures and soil depths from 57 points are presented as well as soil texture profiles from 14 points. Meteorological data include continuous hourly shielded, unshielded, and wind corrected precipitation, wind speed, air temperature, relative humidity, dew point temperature, and incoming solar and thermal radiation data. Sub-surface data included are hourly soil moisture data from multiple depths from 7 soil profiles within the catchment, and soil temperatures from multiple depths from 2 soil profiles. Hydrologic response data include hourly stream discharge from the catchment outlet weir, continuous snow depths from one location, intermittent snow depths from 5 locations, and snow depth and density data from ten weekly snow surveys. Though it represents only a single water year, the presentation of both above and below ground hydrologic condition makes it one of the most detailed and complete hydro-climatic datasets from the climatically sensitive rain-snow transition zone for a wide range of modeling and descriptive studies.