931 resultados para large data sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thermodynamic consistency of almost 90 VLE data series, including isothermal and isobaric conditions for systems of both total and partial miscibility in the liquid phase, has been examined by means of the area and point-to-point tests. In addition, the Gibbs energy of mixing function calculated from these experimental data has been inspected, with some rather surprising results: certain data sets exhibiting high dispersion or leading to Gibbs energy of mixing curves inconsistent with the total or partial miscibility of the liquid phase, surprisingly, pass the tests. Several possible inconsistencies in the tests themselves or in their application are discussed. Related to this is a very interesting and ambitious initiative that arose within the NIST organization: the development of an algorithm to assess the quality of experimental VLE data. The present paper questions the applicability of two of the five tests that are combined in the algorithm. It further shows that the deviation of the experimental VLE data from the correlation obtained by a given model, the basis of some point-to-point tests, should not be used to evaluate the quality of these data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Questions of handling unbalanced data considered in this article. As models for classification, PNN and MLP are used. Problem of estimation of model performance in case of unbalanced training set is solved. Several methods (clustering approach and boosting approach) considered as useful to deal with the problem of input data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"July 2002."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. © 2005 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information extraction or knowledge discovery from large data sets should be linked to data aggregation process. Data aggregation process can result in a new data representation with decreased number of objects of a given set. A deterministic approach to separable data aggregation means a lesser number of objects without mixing of objects from different categories. A statistical approach is less restrictive and allows for almost separable data aggregation with a low level of mixing of objects from different categories. Layers of formal neurons can be designed for the purpose of data aggregation both in the case of deterministic and statistical approach. The proposed designing method is based on minimization of the of the convex and piecewise linear (CPL) criterion functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

long-term research on freshwater ecosystems provides insights that can be difficult to obtain from other approaches. Widespread monitoring of ecologically relevant water-quality parameters spanning decades can facilitate important tests of ecological principles. Unique long-term data sets and analytical tools are increasingly available, allowing for powerful and synthetic analyses across sites. long-term measurements or experiments in aquatic systems can catch rare events, changes in highly variable systems, time-lagged responses, cumulative effects of stressors, and biotic responses that encompass multiple generations. Data are available from formal networks, local to international agencies, private organizations, various institutions, and paleontological and historic records; brief literature surveys suggest much existing data are not synthesized. Ecological sciences will benefit from careful maintenance and analyses of existing long-term programs, and subsequent insights can aid in the design of effective future long-term experimental and observational efforts. long-term research on freshwaters is particularly important because of their value to humanity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Approaches to quantify the organic carbon accumulation on a global scale generally do not consider the small-scale variability of sedimentary and oceanographic boundary conditions along continental margins. In this study, we present a new approach to regionalize the total organic carbon (TOC) content in surface sediments (<5 cm sediment depth). It is based on a compilation of more than 5500 single measurements from various sources. Global TOC distribution was determined by the application of a combined qualitative and quantitative-geostatistical method. Overall, 33 benthic TOC-based provinces were defined and used to process the global distribution pattern of the TOC content in surface sediments in a 1°x1° grid resolution. Regional dependencies of data points within each single province are expressed by modeled semi-variograms. Measured and estimated TOC values show good correlation, emphasizing the reasonable applicability of the method. The accumulation of organic carbon in marine surface sediments is a key parameter in the control of mineralization processes and the material exchange between the sediment and the ocean water. Our approach will help to improve global budgets of nutrient and carbon cycles.