970 resultados para Datasets


Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the ability to collect and store increasingly large datasets on modern computers comes the need to be able to process the data in a way that can be useful to a Geostatistician or application scientist. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively for likelihood-based Geostatistics. Various methods have been proposed and are extensively used in an attempt to overcome these complexity issues. This thesis introduces a number of principled techniques for treating large datasets with an emphasis on three main areas: reduced complexity covariance matrices, sparsity in the covariance matrix and parallel algorithms for distributed computation. These techniques are presented individually, but it is also shown how they can be combined to produce techniques for further improving computational efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although the importance of dataset fitness-for-use evaluation and intercomparison is widely recognised within the GIS community, no practical tools have yet been developed to support such interrogation. GeoViQua aims to develop a GEO label which will visually summarise and allow interrogation of key informational aspects of geospatial datasets upon which users rely when selecting datasets for use. The proposed GEO label will be integrated in the Global Earth Observation System of Systems (GEOSS) and will be used as a value and trust indicator for datasets accessible through the GEO Portal. As envisioned, the GEO label will act as a decision support mechanism for dataset selection and thereby hopefully improve user recognition of the quality of datasets. To date we have conducted 3 user studies to (1) identify the informational aspects of geospatial datasets upon which users rely when assessing dataset quality and trustworthiness, (2) elicit initial user views on a GEO label and its potential role and (3), evaluate prototype label visualisations. Our first study revealed that, when evaluating quality of data, users consider 8 facets: dataset producer information; producer comments on dataset quality; dataset compliance with international standards; community advice; dataset ratings; links to dataset citations; expert value judgements; and quantitative quality information. Our second study confirmed the relevance of these facets in terms of the community-perceived function that a GEO label should fulfil: users and producers of geospatial data supported the concept of a GEO label that provides a drill-down interrogation facility covering all 8 informational aspects. Consequently, we developed three prototype label visualisations and evaluated their comparative effectiveness and user preference via a third user study to arrive at a final graphical GEO label representation. When integrated in the GEOSS, an individual GEO label will be provided for each dataset in the GEOSS clearinghouse (or other data portals and clearinghouses) based on its available quality information. Producer and feedback metadata documents are being used to dynamically assess information availability and generate the GEO labels. The producer metadata document can either be a standard ISO compliant metadata record supplied with the dataset, or an extended version of a GeoViQua-derived metadata record, and is used to assess the availability of a producer profile, producer comments, compliance with standards, citations and quantitative quality information. GeoViQua is also currently developing a feedback server to collect and encode (as metadata records) user and producer feedback on datasets; these metadata records will be used to assess the availability of user comments, ratings, expert reviews and user-supplied citations for a dataset. The GEO label will provide drill-down functionality which will allow a user to navigate to a GEO label page offering detailed quality information for its associated dataset. At this stage, we are developing the GEO label service that will be used to provide GEO labels on demand based on supplied metadata records. In this presentation, we will provide a comprehensive overview of the GEO label development process, with specific emphasis on the GEO label implementation and integration into the GEOSS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Geospatial data have become a crucial input for the scientific community for understanding the environment and developing environmental management policies. The Global Earth Observation System of Systems (GEOSS) Clearinghouse is a catalogue and search engine that provides access to the Earth Observation metadata. However, metadata are often not easily understood by users, especially when presented in ISO XML encoding. Data quality included in the metadata is basic for users to select datasets suitable for them. This work aims to help users to understand the quality information held in metadata records and to provide the results to geospatial users in an understandable and comparable way. Thus, we have developed an enhanced tool (Rubric-Q) for visually assessing the metadata quality information and quantifying the degree of metadata population. Rubric-Q is an extension of a previous NOAA Rubric tool used as a metadata training and improvement instrument. The paper also presents a thorough assessment of the quality information by applying the Rubric-Q to all dataset metadata records available in the GEOSS Clearinghouse. The results reveal that just 8.7% of the datasets have some quality element described in the metadata, 63.4% have some lineage element documented, and merely 1.2% has some usage element described. © 2013 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The sharing of near real-time traceability knowledge in supply chains plays a central role in coordinating business operations and is a key driver for their success. However before traceability datasets received from external partners can be integrated with datasets generated internally within an organisation, they need to be validated against information recorded for the physical goods received as well as against bespoke rules defined to ensure uniformity, consistency and completeness within the supply chain. In this paper, we present a knowledge driven framework for the runtime validation of critical constraints on incoming traceability datasets encapuslated as EPCIS event-based linked pedigrees. Our constraints are defined using SPARQL queries and SPIN rules. We present a novel validation architecture based on the integration of Apache Storm framework for real time, distributed computation with popular Semantic Web/Linked data libraries and exemplify our methodology on an abstraction of the pharmaceutical supply chain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the framework of the global energy balance, the radiative energy exchanges between Sun, Earth and space are now accurately quantified from new satellite missions. Much less is known about the magnitude of the energy flows within the climate system and at the Earth surface, which cannot be directly measured by satellites. In addition to satellite observations, here we make extensive use of the growing number of surface observations to constrain the global energy balance not only from space, but also from the surface. We combine these observations with the latest modeling efforts performed for the 5th IPCC assessment report to infer best estimates for the global mean surface radiative components. Our analyses favor global mean downward surface solar and thermal radiation values near 185 and 342 Wm**-2, respectively, which are most compatible with surface observations. Combined with an estimated surface absorbed solar radiation and thermal emission of 161 Wm**-2 and 397 Wm**-2, respectively, this leaves 106 Wm**-2 of surface net radiation available for distribution amongst the non-radiative surface energy balance components. The climate models overestimate the downward solar and underestimate the downward thermal radiation, thereby simulating nevertheless an adequate global mean surface net radiation by error compensation. This also suggests that, globally, the simulated surface sensible and latent heat fluxes, around 20 and 85 Wm**-2 on average, state realistic values. The findings of this study are compiled into a new global energy balance diagram, which may be able to reconcile currently disputed inconsistencies between energy and water cycle estimates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of models of marine ecosystems in the Southern Ocean is becoming increasingly important as a means of understanding and managing impacts such as exploitation and climate change. Collating data from disparate sources, and understanding biases or uncertainties inherent in those data, are important first steps for improving ecosystem models. This review focuses on seals that breed in ice habitats of the Southern Ocean (i.e. the crabeater seal, Lobodon carcinophaga; Ross seal, Ommatophoca rossii; leopard seal, Hydrurga leptonyx; and Weddell seal, Leptonychotes weddellii). Data on populations (abundance and trends in abundance), distribution and habitat use (movement, key habitat and environmental features) and foraging (diet) are summarised, and potential biases and uncertainties inherent in those data are identified and discussed. Spatial and temporal gaps in knowledge of the populations, habitats and diet of each species are also identified.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces two new datasets on national level elections from 1975 to 2004. The data are grouped into two separate datasets, the Quality of Elections Data and the Data on International Election Monitoring. Together these data sets provide original information on elections, election observation and election quality, and will enable researchers to study a variety of research questions. The datasets will be publicly available and are maintained at a project website.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MAREDAT atlas covers 11 types of plankton, ranging in size from bacteria to jellyfish. Together, these plankton groups determine the health and productivity of the global ocean and play a vital role in the global carbon cycle. Working within a uniform and consistent spatial and depth grid (map) of the global ocean, the researchers compiled thousands and tens of thousands of data points to identify regions of plankton abundance and scarcity as well as areas of data abundance and scarcity. At many of the grid points, the MAREDAT team accomplished the difficult conversion from abundance (numbers of organisms) to biomass (carbon mass of organisms). The MAREDAT atlas provides an unprecedented global data set for ecological and biochemical analysis and modeling as well as a clear mandate for compiling additional existing data and for focusing future data gathering efforts on key groups in key areas of the ocean. The present data set presents depth integrated values of diazotrophs Gamma-A nifH genes abundance, computed from a collection of source data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MAREDAT atlas covers 11 types of plankton, ranging in size from bacteria to jellyfish. Together, these plankton groups determine the health and productivity of the global ocean and play a vital role in the global carbon cycle. Working within a uniform and consistent spatial and depth grid (map) of the global ocean, the researchers compiled thousands and tens of thousands of data points to identify regions of plankton abundance and scarcity as well as areas of data abundance and scarcity. At many of the grid points, the MAREDAT team accomplished the difficult conversion from abundance (numbers of organisms) to biomass (carbon mass of organisms). The MAREDAT atlas provides an unprecedented global data set for ecological and biochemical analysis and modeling as well as a clear mandate for compiling additional existing data and for focusing future data gathering efforts on key groups in key areas of the ocean. The present collection presents the original data sets used to compile Global distributions of diazotrophs abundance, biomass and nitrogen fixation rates

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Annual precipitation for the last 2,500 years was reconstructed for northeastern Qinghai from living and archaeological juniper trees. A dominant feature of the precipitation of this area is a high degree of variability in mean rainfall at annual, decadal, and centennial scales, with many wet and dry periods that are corroborated by other paleoclimatic indicators. Reconstructed values of annual precipitation vary mostly from 100 to 300 mm and thus are no different from the modern instrumental record in Dulan. However, relatively dry years with below-average precipitation occurred more frequently in the past than in the present. Periods of relatively dry years occurred during 74-25 BC, AD 51-375, 426-500, 526-575, 626-700, 1100-1225, 1251-1325, 1451-1525, 1651-1750 and 1801-1825. Periods with a relatively wet climate occurred during AD 376-425, 576-625, 951-1050, 1351-1375, 1551-1600 and the present. This variability is probably related to latitudinal positions of winter frontal storms. Another key feature of precipitation in this area is an apparently direct relationship between interannual variability in rainfall with temperature, whereby increased warming in the future might lead to increased flooding and droughts. Such increased climatic variability might then impact human societies of the area, much as the climate has done for the past 2,500 years.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MAREDAT atlas covers 11 types of plankton, ranging in size from bacteria to jellyfish. Together, these plankton groups determine the health and productivity of the global ocean and play a vital role in the global carbon cycle. Working within a uniform and consistent spatial and depth grid (map) of the global ocean, the researchers compiled thousands and tens of thousands of data points to identify regions of plankton abundance and scarcity as well as areas of data abundance and scarcity. At many of the grid points, the MAREDAT team accomplished the difficult conversion from abundance (numbers of organisms) to biomass (carbon mass of organisms). The MAREDAT atlas provides an unprecedented global data set for ecological and biochemical analysis and modeling as well as a clear mandate for compiling additional existing data and for focusing future data gathering efforts on key groups in key areas of the ocean. The present data set presents depth integrated values of diazotrophs abundance and biomass, computed from a collection of source data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MAREDAT atlas covers 11 types of plankton, ranging in size from bacteria to jellyfish. Together, these plankton groups determine the health and productivity of the global ocean and play a vital role in the global carbon cycle. Working within a uniform and consistent spatial and depth grid (map) of the global ocean, the researchers compiled thousands and tens of thousands of data points to identify regions of plankton abundance and scarcity as well as areas of data abundance and scarcity. At many of the grid points, the MAREDAT team accomplished the difficult conversion from abundance (numbers of organisms) to biomass (carbon mass of organisms). The MAREDAT atlas provides an unprecedented global data set for ecological and biochemical analysis and modeling as well as a clear mandate for compiling additional existing data and for focusing future data gathering efforts on key groups in key areas of the ocean. The present data set presents depth integrated values of diazotrophs nitrogen fixation rates, computed from a collection of source data sets.