963 resultados para Scientific data
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
Embora o objectivo de redução de acidentes laborais seja frequentemente invocado para justificar uma aplicação preventiva de testes de álcool e drogas no trabalho, há poucas evidências estatisticamente relevantes das pressupostas causalidade e correlação negativa entre a sujeição aos testes e os posteriores acidentes. Os dados de testes e dos acidentes ocorridos com os colaboradores de uma transportadora ferroviária portuguesa de âmbito nacional, durante anos recentes, começam agora a ser explorados, em busca de relações entre estas e outras variáveis biográficas. - Although the aim of reducing occupational accidents is frequently cited to justify preventive drug and alcohol testing at work, there is little statistically significant evidence of the assumed causality and negative correlation between exposure to testing and subsequent accidents. Data mining of tests and accidents involving employees of a Portuguese national wide railway transportation company, during recent years, is now beginning in search of relations between these and other biographical variables.
Resumo:
Virtual globe technology holds many exciting possibilities for environmental science. These easy-to-use, intuitive systems provide means for simultaneously visualizing four-dimensional environmental data from many different sources, enabling the generation of new hypotheses and driving greater understanding of the Earth system. Through the use of simple markup languages, scientists can publish and consume data in interoperable formats without the need for technical assistance. In this paper we give, with examples from our own work, a number of scientific uses for virtual globes, demonstrating their particular advantages. We explain how we have used Web Services to connect virtual globes with diverse data sources and enable more sophisticated usage such as data analysis and collaborative visualization. We also discuss the current limitations of the technology, with particular regard to the visualization of subsurface data and vertical sections.
Resumo:
This paper gives an overview of the project Changing Coastlines: data assimilation for morphodynamic prediction and predictability. This project is investigating whether data assimilation could be used to improve coastal morphodynamic modeling. The concept of data assimilation is described, and the benefits that data assimilation could bring to coastal morphodynamic modeling are discussed. Application of data assimilation in a simple 1D morphodynamic model is presented. This shows that data assimilation can be used to improve the current state of the model bathymetry, and to tune the model parameter. We now intend to implement these ideas in a 2D morphodynamic model, for two study sites. The logistics of this are considered, including model design and implementation, and data requirement issues. We envisage that this work could provide a means for maintaining up-to-date information on coastal bathymetry, without the need for costly survey campaigns. This would be useful for a range of coastal management issues, including coastal flood forecasting.
Resumo:
The service-oriented approach to performing distributed scientific research is potentially very powerful but is not yet widely used in many scientific fields. This is partly due to the technical difficulties involved in creating services and workflows and the inefficiency of many workflow systems with regard to handling large datasets. We present the Styx Grid Service, a simple system that wraps command-line programs and allows them to be run over the Internet exactly as if they were local programs. Styx Grid Services are very easy to create and use and can be composed into powerful workflows with simple shell scripts or more sophisticated graphical tools. An important feature of the system is that data can be streamed directly from service to service, significantly increasing the efficiency of workflows that use large data volumes. The status and progress of Styx Grid Services can be monitored asynchronously using a mechanism that places very few demands on firewalls. We show how Styx Grid Services can interoperate with with Web Services and WS-Resources using suitable adapters.
Resumo:
As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations.
Resumo:
This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.
Resumo:
Elephant poaching and the ivory trade remain high on the agenda at meetings of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Well-informed debates require robust estimates of trends, the spatial distribution of poaching, and drivers of poaching. We present an analysis of trends and drivers of an indicator of elephant poaching of all elephant species. The site-based monitoring system known as Monitoring the Illegal Killing of Elephants (MIKE), set up by the 10th Conference of the Parties of CITES in 1997, produces carcass encounter data reported mainly by anti-poaching patrols. Data analyzed were site by year totals of 6,337 carcasses from 66 sites in Africa and Asia from 2002–2009. Analysis of these observational data is a serious challenge to traditional statistical methods because of the opportunistic and non-random nature of patrols, and the heterogeneity across sites. Adopting a Bayesian hierarchical modeling approach, we used the proportion of carcasses that were illegally killed (PIKE) as a poaching index, to estimate the trend and the effects of site- and country-level factors associated with poaching. Important drivers of illegal killing that emerged at country level were poor governance and low levels of human development, and at site level, forest cover and area of the site in regions where human population density is low. After a drop from 2002, PIKE remained fairly constant from 2003 until 2006, after which it increased until 2008. The results for 2009 indicate a decline. Sites with PIKE ranging from the lowest to the highest were identified. The results of the analysis provide a sound information base for scientific evidence-based decision making in the CITES process.
Resumo:
Methods for producing nonuniform transformations, or regradings, of discrete data are discussed. The transformations are useful in image processing, principally for enhancement and normalization of scenes. Regradings which “equidistribute” the histogram of the data, that is, which transform it into a constant function, are determined. Techniques for smoothing the regrading, dependent upon a continuously variable parameter, are presented. Generalized methods for constructing regradings such that the histogram of the data is transformed into any prescribed function are also discussed. Numerical algorithms for implementing the procedures and applications to specific examples are described.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
Resumo:
Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.
Resumo:
With the introduction of new observing systems based on asynoptic observations, the analysis problem has changed in character. In the near future we may expect that a considerable part of meteorological observations will be unevenly distributed in four dimensions, i.e. three dimensions in space and one in time. The term analysis, or objective analysis in meteorology, means the process of interpolating observed meteorological observations from unevenly distributed locations to a network of regularly spaced grid points. Necessitated by the requirement of numerical weather prediction models to solve the governing finite difference equations on such a grid lattice, the objective analysis is a three-dimensional (or mostly two-dimensional) interpolation technique. As a consequence of the structure of the conventional synoptic network with separated data-sparse and data-dense areas, four-dimensional analysis has in fact been intensively used for many years. Weather services have thus based their analysis not only on synoptic data at the time of the analysis and climatology, but also on the fields predicted from the previous observation hour and valid at the time of the analysis. The inclusion of the time dimension in objective analysis will be called four-dimensional data assimilation. From one point of view it seems possible to apply the conventional technique on the new data sources by simply reducing the time interval in the analysis-forecasting cycle. This could in fact be justified also for the conventional observations. We have a fairly good coverage of surface observations 8 times a day and several upper air stations are making radiosonde and radiowind observations 4 times a day. If we have a 3-hour step in the analysis-forecasting cycle instead of 12 hours, which is applied most often, we may without any difficulties treat all observations as synoptic. No observation would thus be more than 90 minutes off time and the observations even during strong transient motion would fall within a horizontal mesh of 500 km * 500 km.