100 resultados para Approximate Sum Rule


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rensch’s rule, which states that the magnitude of sexual size dimorphism tends to increase with increasing body size, has evolved independently in three lineages of large herbivorous mammals: bovids (antelopes), cervids (deer), and macropodids (kangaroos). This pattern can be explained by a model that combines allometry,life-history theory, and energetics. The key features are thatfemale group size increases with increasing body size and that males have evolved under sexual selection to grow large enough to control these groups of females. The model predicts relationships among body size and female group size, male and female age at first breeding,death and growth rates, and energy allocation of males to produce body mass and weapons. Model predictions are well supported by data for these megaherbivores. The model suggests hypotheses for why some other sexually dimorphic taxa, such as primates and pinnipeds(seals and sea lions), do or do not conform to Rensh’s rule.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper considers second kind integral equations of the form $\phi (x) = g(x) + \int_S {k(x,y)} \phi (y)ds(y)$ (abbreviated $\phi = g + K\phi $), in which S is an infinite cylindrical surface of arbitrary smooth cross section. The “truncated equation” (abbreviated $\phi _a = E_a g + K_a \phi _a $), obtained by replacing S by $S_a $, a closed bounded surface of class $C^2 $, the boundary of a section of the interior of S of length $2a$, is also discussed. Conditions on k are obtained (in particular, implying that K commutes with the operation of translation in the direction of the cylinder axis) which ensure that $I - K$ is invertible, that $I - K_a $ is invertible and $(I - K_a )^{ - 1} $ is uniformly bounded for all sufficiently large a, and that $\phi _a $ converges to $\phi $ in an appropriate sense as $a \to \infty $. Uniform stability and convergence results for a piecewise constant boundary element collocation method for the truncated equations are also obtained. A boundary integral equation, which models three-dimensional acoustic scattering from an infinite rigid cylinder, illustrates the application of the above results to prove existence of solution (of the integral equation and the corresponding boundary value problem) and convergence of a particular collocation method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved sum-product estimate for subsets of a finite field whose order is not prime is provided. It is shown, under certain conditions, that max{∣∣∣A+A∣∣∣,∣∣∣A⋅A∣∣∣}≫∣∣A∣∣12/11(log2∣∣A∣∣)5/11. This new estimate matches, up to a logarithmic factor, the current best known bound obtained over prime fields by Rudnev

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We establish Maximum Principles which apply to vectorial approximate minimizers of the general integral functional of Calculus of Variations. Our main result is a version of the Convex Hull Property. The primary advance compared to results already existing in the literature is that we have dropped the quasiconvexity assumption of the integrand in the gradient term. The lack of weak Lower semicontinuity is compensated by introducing a nonlinear convergence technique, based on the approximation of the projection onto a convex set by reflections and on the invariance of the integrand in the gradient term under the Orthogonal Group. Maximum Principles are implied for the relaxed solution in the case of non-existence of minimizers and for minimizing solutions of the Euler–Lagrange system of PDE.