982 resultados para data aggregation


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The data streaming model provides an attractive framework for one-pass summarization of massive data sets at a single observation point. However, in an environment where multiple data streams arrive at a set of distributed observation points, sketches must be computed remotely and then must be aggregated through a hierarchy before queries may be conducted. As a result, many sketch-based methods for the single stream case do not apply directly, as either the error introduced becomes large, or because the methods assume that the streams are non-overlapping. These limitations hinder the application of these techniques to practical problems in network traffic monitoring and aggregation in sensor networks. To address this, we develop a general framework for evaluating and enabling robust computation of duplicate-sensitive aggregate functions (e.g., SUM and QUANTILE), over data produced by distributed sources. We instantiate our approach by augmenting the Count-Min and Quantile-Digest sketches to apply in this distributed setting, and analyze their performance. We conclude with experimental evaluation to validate our analysis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article discusses a range of regression techniques specifically tailored to building aggregation operators from empirical data. These techniques identify optimal parameters of aggregation operators from various classes (triangular norms, uninorms, copulas, ordered weighted aggregation (OWA), generalized means, and compensatory and general aggregation operators), while allowing one to preserve specific properties such as commutativity or associativity. © 2003 Wiley Periodicals, Inc.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper treats the problem of fitting general aggregation operators with unfixed number of arguments to empirical data. We discuss methods applicable to associative operators (t-norms, t-conorms, uninorms and nullnorms), means and Choquet integral based operators with respect to a universal fuzzy measure. Special attention is paid to k-order additive symmetric fuzzy measures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We address the issue of identifying various classes of aggregation operators from empirical data, which also preserves the ordering of the outputs. It is argued that the ordering of the outputs is more important than the numerical values, however the usual data fitting methods are only concerned with fitting the values. We will formulate preservation of the ordering problem as a standard mathematical programming problem, solved by standard numerical methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Theoretical advances in modelling aggregation of information produced a wide range of aggregation operators, applicable to almost every practical problem. The most important classes of aggregation operators include triangular norms, uninorms, generalised means and OWA operators.
With such a variety, an important practical problem has emerged: how to fit the parameters/ weights of these families of aggregation operators to observed data? How to estimate quantitatively whether a given class of operators is suitable as a model in a given practical setting? Aggregation operators are rather special classes of functions, and thus they require specialised regression techniques, which would enforce important theoretical properties, like commutativity or associativity. My presentation will address this issue in detail, and will discuss various regression methods applicable specifically to t-norms, uninorms and generalised means. I will also demonstrate software implementing these regression techniques, which would allow practitioners to paste their data and obtain optimal parameters of the chosen family of operators.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article examines the construction of aggregation functions from data by minimizing the least absolute deviation criterion. We formulate various instances of such problems as linear programming problems. We consider the cases in which the data are provided as intervals, and the outputs ordering needs to be preserved, and show that linear programming formulation is valid for such cases. This feature is very valuable in practice, since the standard simplex method can be used.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work we analyze the key issue of the relationship that should hold between the operators in a family {An} of aggregation operators in order to understand they properly define a consistent whole. Here we extend some of the ideas about stability of a family of aggregation operators into a more general framework, formally defining the notions of i – L and j – R strict stability for families of aggregation operators. The notion of strict stability of order k is introduced as well. Finally, we also present an application of the strict stability conditions to deal with missing data problems in an information aggregation process. For this analysis, we have focused in the weighted mean family and the quasi-arithmetic weighted means families.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Applaggin (Agkistrodon piscivorus piscivorus platelet-aggregation inhibitor) is a potent inhibitor of blood platelet aggregation derived from the venom of the North American water moccasin, the protein consists of 71 amino acids, is rich in cysteines, contains the sequence-recognition site of adhesion proteins at positions 50-52 (Arg-Gly-Asp) and shares high sequence homology with other snake-venom disintegrins such as echistatin, kistrin and trigramin, Single crystals of applaggin have been grown and X-ray diffraction data have been collected to a resolution of 3.2 Angstrom. The crystals belong to space group P4(1)2(1)2 (or its enantiomorph), with unit-cell dimensions a = b = 63.35, c = 74.18 Angstrom and two molecules per asymmetric unit. Molecular replacement using models constructed from the NMR structures of echistatin and kistrin has not been successful in producing a trial structure for applaggin.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A problem frequently encountered in Data Envelopment Analysis (DEA) is that the total number of inputs and outputs included tend to be too many relative to the sample size. One way to counter this problem is to combine several inputs (or outputs) into (meaningful) aggregate variables reducing thereby the dimension of the input (or output) vector. A direct effect of input aggregation is to reduce the number of constraints. This, in its turn, alters the optimal value of the objective function. In this paper, we show how a statistical test proposed by Banker (1993) may be applied to test the validity of a specific way of aggregating several inputs. An empirical application using data from Indian manufacturing for the year 2002-03 is included as an example of the proposed test.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the first part of this two-part contribution we deal with the concept of regularization, a quite standard technique from machine learning applied so as to increase the fit quality on test and validation data samples. Due to the constraints on the weighting vector, it turns out that quite different methods can be used in the current framework, as compared to regression models. Moreover, it is worth noting that so far fitting weighted quasi-arithmetic means to empirical data has only been performed approximately, via the so-called linearization technique. In this paper we consider exact solutions to such special optimization tasks and indicate cases where linearization leads to much worse solutions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the second part of this two-part contribution we deal with a quite common situation in which we have inputs coming from different sources, describing a similar phenomenon, but which have not been properly normalized. In such a case, idempotent and nondecreasing functions cannot be used to aggregate them unless proper preprocessing is performed. The proposed idempotization method, based on the notion of B-splines, allows for an automatic calibration of independent variables. The introduced technique is applied in an R source code plagiarism detection system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of research data management infrastructure and services and making research data more discoverable and accessible to the research community is a key priority at the national, state and individual university level. This paper will discuss and reflect upon a collaborative project between Griffith University and the Queensland University of Technology to commission a Metadata Hub or Metadata Aggregation service based upon open source software components. It will describe the role that metadata aggregation services play in modern research infrastructure and argue that this role is a critical one.