798 resultados para Data-Intensive Science
Resumo:
Overview of the growth of policies and a critical appraisal of the issues affecting open access, open data and open science policies. Example policies and a roadmap for open access, open research data and open science are included.
Resumo:
This study deals with the problem of how to collect genuine and useful data about science classroom practices, and preserving the complex and holistic nature of teaching and learning. Additionally, we were looking for an instrument that would allow comparability and verifiability for teaching and research purposes. Given the multimodality of teaching and learning processes, we developed the multimodal narrative (MN), which describes what happens during a task and incorporates data such as examples of students’ work.
Resumo:
Intensification of agricultural production without a sound management and regulations can lead to severe environmental problems, as in Western Santa Catarina State, Brazil, where intensive swine production has caused large accumulations of manure and consequently water pollution. Natural resource scientists are asked by decision-makers for advice on management and regulatory decisions. Distributed environmental models are useful tools, since they can be used to explore consequences of various management practices. However, in many areas of the world, quantitative data for model calibration and validation are lacking. The data-intensive distributed environmental model AgNPS was applied in a data-poor environment, the upper catchment (2,520 ha) of the Ariranhazinho River, near the city of Seara, in Santa Catarina State. Steps included data preparation, cell size selection, sensitivity analysis, model calibration and application to different management scenarios. The model was calibrated based on a best guess for model parameters and on a pragmatic sensitivity analysis. The parameters were adjusted to match model outputs (runoff volume, peak runoff rate and sediment concentration) closely with the sparse observed data. A modelling grid cell resolution of 150 m adduced appropriate and computer-fit results. The rainfall runoff response of the AgNPS model was calibrated using three separate rainfall ranges (< 25, 25-60, > 60 mm). Predicted sediment concentrations were consistently six to ten times higher than observed, probably due to sediment trapping along vegetated channel banks. Predicted N and P concentrations in stream water ranged from just below to well above regulatory norms. Expert knowledge of the area, in addition to experience reported in the literature, was able to compensate in part for limited calibration data. Several scenarios (actual, recommended and excessive manure applications, and point source pollution from swine operations) could be compared by the model, using a relative ranking rather than quantitative predictions.
Resumo:
OBJECTIVE: Critically ill patients are at high risk of malnutrition. Insufficient nutritional support still remains a widespread problem despite guidelines. The aim of this study was to measure the clinical impact of a two-step interdisciplinary quality nutrition program. DESIGN: Prospective interventional study over three periods (A, baseline; B and C, intervention periods). SETTING: Mixed intensive care unit within a university hospital. PATIENTS: Five hundred seventy-two patients (age 59 ± 17 yrs) requiring >72 hrs of intensive care unit treatment. INTERVENTION: Two-step quality program: 1) bottom-up implementation of feeding guideline; and 2) additional presence of an intensive care unit dietitian. The nutrition protocol was based on the European guidelines. MEASUREMENTS AND MAIN RESULTS: Anthropometric data, intensive care unit severity scores, energy delivery, and cumulated energy balance (daily, day 7, and discharge), feeding route (enteral, parenteral, combined, none-oral), length of intensive care unit and hospital stay, and mortality were collected. Altogether 5800 intensive care unit days were analyzed. Patients in period A were healthier with lower Simplified Acute Physiologic Scale and proportion of "rapidly fatal" McCabe scores. Energy delivery and balance increased gradually: impact was particularly marked on cumulated energy deficit on day 7 which improved from -5870 kcal to -3950 kcal (p < .001). Feeding technique changed significantly with progressive increase of days with nutrition therapy (A: 59% days, B: 69%, C: 71%, p < .001), use of enteral nutrition increased from A to B (stable in C), and days on combined and parenteral nutrition increased progressively. Oral energy intakes were low (mean: 385 kcal*day, 6 kcal*kg*day ). Hospital mortality increased with severity of condition in periods B and C. CONCLUSION: A bottom-up protocol improved nutritional support. The presence of the intensive care unit dietitian provided significant additional progression, which were related to early introduction and route of feeding, and which achieved overall better early energy balance.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Presentation at the Nordic Perspectives on Open Access and Open Science seminar, Helsinki, October 15, 2013
Resumo:
Presentation at the Nordic Perspectives on Open Access and Open Science seminar, Helsinki, October 15, 2013
Resumo:
Workshop at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations.
Resumo:
New compilations of African pollen and lake data are compared with climate (CCM1, NCAR, Boulder) and vegetation (BIOME 1.2, GSG, Lund) simulations for the last glacial maximum (LGM) and early to mid-Holocene (EMH). The simulated LGM climate was ca 4°C colder and drier than present, with maximum reduction in precipitation in semi-arid regions. Biome simulations show lowering of montane vegetation belts and expansion of southern xerophytic associations, but no change in the distribution of deserts and tropical rain forests. The lakes show LGM conditions similar or drier than present throughout northern and tropical Africa. Pollen data indicate lowering of montane vegetation belts, the stability of the Sahara, and a reduction of rain forest. The paleoenvironmental data are consistent with the simulated changes in temperature and moisture budgets, although they suggest the climate model underestimates equatorial aridity. EMH simulations show temperatures slightly less than present and increased monsoonal precipitation in the eastern Sahara and East Africa. Biome simulations show an upward shift of montane vegetation belts, fragmentation of xerophytic vegetation in southern Africa, and a major northward shift of the southern margin of the eastern Sahara. The lakes indicate conditions wetter than present across northern Africa. Pollen data show an upward shift of the montane forests, the northward shift of the southern margin of the Sahara, and a major extension of tropical rain forest. The lake and pollen data confirm monsoon expansion in eastern Africa, but the climate model fails to simulate the wet conditions in western Africa.
Resumo:
We present an overview of the MELODIES project, which is developing new data-intensive environmental services based on data from Earth Observation satellites, government databases, national and European agencies and more. We focus here on the capabilities and benefits of the project’s “technical platform”, which applies cloud computing and Linked Data technologies to enable the development of these services, providing flexibility and scalability.
Resumo:
Includes bibliography
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.