919 resultados para VLE data sets
Resumo:
The software PanGet is a special tool for the download of multiple data sets from PANGAEA. It uses the PANGAEA data set ID which is unique and part of the DOI. In a first step a list of ID's of those data sets to be downloaded must be created. There are two choices to define this individual collection of sets. Based on the ID list, the tool will download the data sets. Failed downloads are written to the file *_failed.txt. The functionality of PanGet is also part of the program Pan2Applic (choose File > Download PANGAEA datasets...) and PanTool2 (choose Basic tools > Download PANGAEA datasets...).
Resumo:
Managing large medical image collections is an increasingly demanding important issue in many hospitals and other medical settings. A huge amount of this information is daily generated, which requires robust and agile systems. In this paper we present a distributed multi-agent system capable of managing very large medical image datasets. In this approach, agents extract low-level information from images and store them in a data structure implemented in a relational database. The data structure can also store semantic information related to images and particular regions. A distinctive aspect of our work is that a single image can be divided so that the resultant sub-images can be stored and managed separately by different agents to improve performance in data accessing and processing. The system also offers the possibility of applying some region-based operations and filters on images, facilitating image classification. These operations can be performed directly on data structures in the database.
Resumo:
We examine, with recently developed Lagrangian tools, altimeter data and numerical simulations obtained from the HYCOM model in the Gulf of Mexico. Our data correspond to the months just after the Deepwater Horizon oil spill in the year 2010. Our Lagrangian analysis provides a skeleton that allows the interpretation of transport routes over the ocean surface. The transport routes are further verified by the simultaneous study of the evolution of several drifters launched during those months in the Gulf of Mexico. We find that there exist Lagrangian structures that justify the dynamics of the drifters, although the agreement depends on the quality of the data. We discuss the impact of the Lagrangian tools on the assessment of the predictive capacity of these data sets.
Resumo:
Funding The International Primary Care Respiratory Group (IPCRG) provided funding for this research project as an UNLOCK group study for which the funding was obtained through an unrestricted grant by Novartis AG, Basel, Switzerland. The latter funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Database access for the OPCRD was provided by the Respiratory Effectiveness Group (REG) and Research in Real Life; the OPCRD statistical analysis was funded by REG. The Bocholtz Study was funded by PICASSO for COPD, an initiative of Boehringer Ingelheim, Pfizer and the Caphri Research Institute, Maastricht University, The Netherlands.
Resumo:
A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.
Resumo:
Questions of handling unbalanced data considered in this article. As models for classification, PNN and MLP are used. Problem of estimation of model performance in case of unbalanced training set is solved. Several methods (clustering approach and boosting approach) considered as useful to deal with the problem of input data.
Resumo:
"OM91-0512"--P. [80].
Resumo:
"September 1986."
Resumo:
"July 2002."
Resumo:
Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.
Resumo:
The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.
Resumo:
Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential framework for inference in such projected processes is presented, where the observations are considered one at a time. We introduce a C++ library for carrying out such projected, sequential estimation which adds several novel features. In particular we have incorporated the ability to use a generic observation operator, or sensor model, to permit data fusion. We can also cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the variogram parameters is based on maximum likelihood estimation. We illustrate the projected sequential method in application to synthetic and real data sets. We discuss the software implementation and suggest possible future extensions.