986 resultados para sample complexity


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper "the importance of convexity in learning with squared loss" gave a lower bound on the sample complexity of learning with quadratic loss using a nonconvex function class. The proof contains an error. We show that the lower bound is true under a stronger condition that holds for many cases of interest.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The performance of postdetection integration (PDI) techniques for the detection of Global Navigation Satellite Systems (GNSS) signals in the presence of uncertainties in frequency offsets, noise variance, and unknown data-bits is studied. It is shown that the conventional PDI techniques are generally not robust to uncertainty in the data-bits and/or the noise variance. Two new modified PDI techniques are proposed, and they are shown to be robust to these uncertainties. The receiver operating characteristics (ROC) and sample complexity performance of the PDI techniques in the presence of model uncertainties are analytically derived. It is shown that the proposed methods significantly outperform existing methods, and hence they could become increasingly important as the GNSS receivers attempt to push the envelope on the minimum signal-to-noise ratio (SNR) for reliable detection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider the problem of Probably Ap-proximate Correct (PAC) learning of a bi-nary classifier from noisy labeled exam-ples acquired from multiple annotators(each characterized by a respective clas-sification noise rate). First, we consider the complete information scenario, where the learner knows the noise rates of all the annotators. For this scenario, we derive sample complexity bound for the Mini-mum Disagreement Algorithm (MDA) on the number of labeled examples to be ob-tained from each annotator. Next, we consider the incomplete information sce-nario, where each annotator is strategic and holds the respective noise rate as a private information. For this scenario, we design a cost optimal procurement auc-tion mechanism along the lines of Myer-son’s optimal auction design framework in a non-trivial manner. This mechanism satisfies incentive compatibility property,thereby facilitating the learner to elicit true noise rates of all the annotators.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The analysis of energy detector systems is a well studied topic in the literature: numerous models have been derived describing the behaviour of single and multiple antenna architectures operating in a variety of radio environments. However, in many cases of interest, these models are not in a closed form and so their evaluation requires the use of numerical methods. In general, these are computationally expensive, which can cause difficulties in certain scenarios, such as in the optimisation of device parameters on low cost hardware. The problem becomes acute in situations where the signal to noise ratio is small and reliable detection is to be ensured or where the number of samples of the received signal is large. Furthermore, due to the analytic complexity of the models, further insight into the behaviour of various system parameters of interest is not readily apparent. In this thesis, an approximation based approach is taken towards the analysis of such systems. By focusing on the situations where exact analyses become complicated, and making a small number of astute simplifications to the underlying mathematical models, it is possible to derive novel, accurate and compact descriptions of system behaviour. Approximations are derived for the analysis of energy detectors with single and multiple antennae operating on additive white Gaussian noise (AWGN) and independent and identically distributed Rayleigh, Nakagami-m and Rice channels; in the multiple antenna case, approximations are derived for systems with maximal ratio combiner (MRC), equal gain combiner (EGC) and square law combiner (SLC) diversity. In each case, error bounds are derived describing the maximum error resulting from the use of the approximations. In addition, it is demonstrated that the derived approximations require fewer computations of simple functions than any of the exact models available in the literature. Consequently, the regions of applicability of the approximations directly complement the regions of applicability of the available exact models. Further novel approximations for other system parameters of interest, such as sample complexity, minimum detectable signal to noise ratio and diversity gain, are also derived. In the course of the analysis, a novel theorem describing the convergence of the chi square, noncentral chi square and gamma distributions towards the normal distribution is derived. The theorem describes a tight upper bound on the error resulting from the application of the central limit theorem to random variables of the aforementioned distributions and gives a much better description of the resulting error than existing Berry-Esseen type bounds. A second novel theorem, providing an upper bound on the maximum error resulting from the use of the central limit theorem to approximate the noncentral chi square distribution where the noncentrality parameter is a multiple of the number of degrees of freedom, is also derived.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In most classical frameworks for learning from examples, it is assumed that examples are randomly drawn and presented to the learner. In this paper, we consider the possibility of a more active learner who is allowed to choose his/her own examples. Our investigations are carried out in a function approximation setting. In particular, using arguments from optimal recovery (Micchelli and Rivlin, 1976), we develop an adaptive sampling strategy (equivalent to adaptive approximation) for arbitrary approximation schemes. We provide a general formulation of the problem and show how it can be regarded as sequential optimal recovery. We demonstrate the application of this general formulation to two special cases of functions on the real line 1) monotonically increasing functions and 2) functions with bounded derivative. An extensive investigation of the sample complexity of approximating these functions is conducted yielding both theoretical and empirical results on test functions. Our theoretical results (stated insPAC-style), along with the simulations demonstrate the superiority of our active scheme over both passive learning as well as classical optimal recovery. The analysis of active function approximation is conducted in a worst-case setting, in contrast with other Bayesian paradigms obtained from optimal design (Mackay, 1992).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We discuss a formulation for active example selection for function learning problems. This formulation is obtained by adapting Fedorov's optimal experiment design to the learning problem. We specifically show how to analytically derive example selection algorithms for certain well defined function classes. We then explore the behavior and sample complexity of such active learning algorithms. Finally, we view object detection as a special case of function learning and show how our formulation reduces to a useful heuristic to choose examples to reduce the generalization error.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Platelets are small blood cells vital for hemostasis. Following vascular damage, platelets adhere to collagens and activate, forming a thrombus that plugs the wound and prevents blood loss. Stimulation of the platelet collagen receptor glycoprotein VI (GPVI) allows recruitment of proteins to receptor-proximal signaling complexes on the inner-leaflet of the plasma membrane. These proteins are often present at low concentrations; therefore, signaling-complex characterization using mass spectrometry is limited due to high sample complexity. We describe a method that facilitates detection of signaling proteins concentrated on membranes. Peripheral membrane proteins (reversibly associated with membranes) were eluted from human platelets with alkaline sodium carbonate. Liquid-phase isoelectric focusing and gel electrophoresis were used to identify proteins that changed in levels on membranes from GPVI-stimulated platelets. Immunoblot analysis verified protein recruitment to platelet membranes and subsequent protein phosphorylation was preserved. Hsp47, a collagen binding protein, was among the proteins identified and found to be exposed on the surface of GPVI-activated platelets. Inhibition of Hsp47 abolished platelet aggregation in response to collagen, while only partially reducing aggregation in response to other platelet agonists. We propose that Hsp47 may therefore play a role in hemostasis and thrombosis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

n this paper, a time series complexity analysis of dense array electroencephalogram signals is carried out using the recently introduced Sample Entropy (SampEn) measure. This statistic quantifies the regularity in signals recorded from systems that can vary from the purely deterministic to purely stochastic realm. The present analysis is conducted with an objective of gaining insight into complexity variations related to changing brain dynamics for EEG recorded from the three cases of passive, eyes closed condition, a mental arithmetic task and the same mental task carried out after a physical exertion task. It is observed that the statistic is a robust quantifier of complexity suited for short physiological signals such as the EEG and it points to the specific brain regions that exhibit lowered complexity during the mental task state as compared to a passive, relaxed state. In the case of mental tasks carried out before and after the performance of a physical exercise, the statistic can detect the variations brought in by the intermediate fatigue inducing exercise period. This enhances its utility in detecting subtle changes in the brain state that can find wider scope for applications in EEG based brain studies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiff(max)) for q not equal 1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiff(max) values were capable of distinguish HRV groups (p-values 5.10 x 10(-3); 1.11 x 10(-7), and 5.50 x 10(-7) for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4758815]