953 resultados para VC-dimension


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We generalize the classical notion of Vapnik–Chernovenkis (VC) dimension to ordinal VC-dimension, in the context of logical learning paradigms. Logical learning paradigms encompass the numerical learning paradigms commonly studied in Inductive Inference. A logical learning paradigm is defined as a set W of structures over some vocabulary, and a set D of first-order formulas that represent data. The sets of models of ϕ in W, where ϕ varies over D, generate a natural topology W over W. We show that if D is closed under boolean operators, then the notion of ordinal VC-dimension offers a perfect characterization for the problem of predicting the truth of the members of D in a member of W, with an ordinal bound on the number of mistakes. This shows that the notion of VC-dimension has a natural interpretation in Inductive Inference, when cast into a logical setting. We also study the relationships between predictive complexity, selective complexity—a variation on predictive complexity—and mind change complexity. The assumptions that D is closed under boolean operators and that W is compact often play a crucial role to establish connections between these concepts. We then consider a computable setting with effective versions of the complexity measures, and show that the equivalence between ordinal VC-dimension and predictive complexity fails. More precisely, we prove that the effective ordinal VC-dimension of a paradigm can be defined when all other effective notions of complexity are undefined. On a better note, when W is compact, all effective notions of complexity are defined, though they are not related as in the noncomputable version of the framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A high-level relationPopper dimension—( Exclusion dimension—( VC dimension—( between Karl Popper’s ideas on “falsifiability of scientific theories” and the notion of “overfitting”Overfitting in statistical learning theory can be easily traced. However, it was pointed out that at the level of technical details the two concepts are significantly different. One possible explanation that we suggest is that the process of falsification is an active process, whereas statistical learning theory is mainly concerned with supervised learningSupervised learning, which is a passive process of learning from examples arriving from a stationary distribution. We show that concepts that are closer (although still distant) to Karl Popper’s definitions of falsifiability can be found in the domain of learning using membership queries, and derive relations between Popper’s dimension, exclusion dimension, and the VC-dimensionVC dimension.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy. The key step in their proof is a d = VC(F) bound on the graph density of a subgraph of the hypercube—oneinclusion graph. The first main result of this paper is a density bound of n [n−1 <=d-1]/[n <=d] < d, which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension d as being d contractible simplicial complexes, extending the well-known characterization that d = 1 maximum classes are trees. We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth—the second part to a conjectured proof of correctness for Peeling—that every class has one-inclusion minimum degree at most its VCdimension. Our final main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of O(logn) and is shown to be optimal up to an O(logk) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy. The key step in their proof is a d=VC(F) bound on the graph density of a subgraph of the hypercube—one-inclusion graph. The first main result of this report is a density bound of n∙choose(n-1,≤d-1)/choose(n,≤d) < d, which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension d as being d-contractible simplicial complexes, extending the well-known characterization that d=1 maximum classes are trees. We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth—the second part to a conjectured proof of correctness for Peeling—that every class has one-inclusion minimum degree at most its VC-dimension. Our final main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of O(log n) and is shown to be optimal up to a O(log k) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The generalization performance of the SVM classifier depends mainly on the VC dimension and the dimensionality of the data. By reducing the VC dimension of the SVM classifier, its generalization performance is expected to increase. In the present paper, we argue that the VC dimension of SVM classifier can be reduced by applying bootstrapping and dimensionality reduction techniques. Experimental results showed that bootstrapping the original data and bootstrapping the projected (dimensionally reduced) data improved the performance of the SVM classifier.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by $k$--means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages, and the goal of spam categorization is to distinguish between spam and legitimate email messages. Spam used to be considered a mere nuisance, but due to the abundant amounts of spam being sent today, it has progressed from being a nuisance to becoming a major problem. Spam filtering is able to control the problem in a variety of ways. Many researches in spam filtering has been centred on the more sophisticated classifier-related issues. Currently,  machine learning for spam classification is an important research issue at present. Support Vector Machines (SVMs) are a new learning method and achieve substantial improvements over the currently preferred methods, and behave robustly whilst tackling a variety of different learning tasks. Due to its high dimensional input, fewer irrelevant features and high accuracy, the  SVMs are more important to researchers for categorizing spam. This paper explores and identifies the use of different learning algorithms for classifying spam and legitimate messages from e-mail. A comparative analysis among the filtering techniques has also been presented in this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Startups’ contributions on economic growth have been widely realized. However, the funding gap is often a problem limiting startups’ development. To some extent, VC can be a means to solve this problem. VC is one of the optimal financial intermediaries for startups. Two streams of VC studies are focused in this dissertation: the criteria used by venture capitalists to evaluate startups and the effect of VC on innovation. First, although many criteria have been analyzed, the empirical assessment of the effect of startup reputation on VC funding has not been investigated. However, reputation is usually positively related with firm performance, which may affect VC funding. By analyzing reputation from the generalized visibility dimension and the generalized favorability dimension using a sample of 200 startups founded from 1995 operating in the UK MNT sector, we show that both the two dimensions of reputation have positive influence on the likelihood of receiving VC funding. We also find that management team heterogeneity positively influence the likelihood of receiving VC funding. Second, studies investigating the effect of venture capital on innovation have frequently resorted to patent data. However, innovation is a process leading from invention to successful commercialization, and while patents capture the upstream side of innovative performance, they poorly describe its downstream one. By reflecting the introduction of new products or services trademarks can complete the picture, but empirical studies on trademarking in startups are rare. Analyzing a sample of 192 startups founded from 1996 operating in the UK MNT sector, we find that VC funding has positive effect on the propensity to register trademarks, as well as on the number and breadth of trademarks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method is presented for the development of a regional Landsat-5 Thematic Mapper (TM) and Landsat-7 Enhanced Thematic Mapper plus (ETM+) spectral greenness index, coherent with a six-dimensional index set, based on a single ETM+ spectral image of a reference landscape. The first three indices of the set are determined by a polar transformation of the first three principal components of the reference image and relate to scene brightness, percent foliage projective cover (FPC) and water related features. The remaining three principal components, of diminishing significance with respect to the reference image, complete the set. The reference landscape, a 2200 km2 area containing a mix of cattle pasture, native woodland and forest, is located near Injune in South East Queensland, Australia. The indices developed from the reference image were tested using TM spectral images from 19 regionally dispersed areas in Queensland, representative of dissimilar landscapes containing woody vegetation ranging from tall closed forest to low open woodland. Examples of image transformations and two-dimensional feature space plots are used to demonstrate image interpretations related to the first three indices. Coherent, sensible, interpretations of landscape features in images composed of the first three indices can be made in terms of brightness (red), foliage cover (green) and water (blue). A limited comparison is made with similar existing indices. The proposed greenness index was found to be very strongly related to FPC and insensitive to smoke. A novel Bayesian, bounded space, modelling method, was used to validate the greenness index as a good predictor of FPC. Airborne LiDAR (Light Detection and Ranging) estimates of FPC along transects of the 19 sites provided the training and validation data. Other spectral indices from the set were found to be useful as model covariates that could improve FPC predictions. They act to adjust the greenness/FPC relationship to suit different spectral backgrounds. The inclusion of an external meteorological covariate showed that further improvements to regional-scale predictions of FPC could be gained over those based on spectral indices alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a mini review from 2002, Tyler Jacks and Robert Weinberg commented on the pioneering three-dimensional (3D) culture work from Bissell laboratories and concluded: “Suddenly the study of cancer cells in two dimensions seems quaint if not archaic.” The relevance of this statement for planning and executing mechanistic biological studies and advanced drug testing has been largely disregarded by both academic researchers and the pharmaceutical and biomedical industry in the twenty-first century.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Medical personnel serving with the Defence Forces have contributed to the evolution of trauma treatment and the advancement of prehospital care within the military environment. This paper investigates the stories of an Australian Medical Officer, Sir Neville Howse, and two stretcher bearers, Private John Simpson (Kirkpatrick) and Private Martin O’Meara, In particular it describes the gruelling conditions under which they performed their roles, and reflects on the legacy that they have left behind in Australian society. While it is widely acknowledged that conflicts such as World War One should never have happened, as civilian and defence force paramedics, we should never forget the service and sacrifice of defence force medical personnel and their contribution to the body of knowledge on the treatment of trauma. These men and women bravely provided emergency care in the most harrowing conditions possible. However, men like Martin O’Meara may not have been given the same status in society today as Sir Neville Howse or Simpson and his donkey, due to the public’s lack of awareness and acceptance of war neurosis and conditions such as post traumatic stress disorder, reactive psychosis and somatoform disorders which were suffered by many soldiers during their wartime service and on their return home after fighting in war.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a rate-based flow control scheme based upon per-VC virtual queuing is proposed for the Available Bit Rate (ABR) service in ATM. In this scheme, each VC in a shared buffer is assigned a virtual queue, which is a counter. To achieve a specific kind of fairness, an appropriate scheduler is applied to the virtual queues. Each VC's bottleneck rate (fair share) is derived from its virtual cell departure rate. This approach of deriving a VC's fair share is simple and accurate. By controlling each VC with respect to its virtual queue and queue build-up in the shared buffer, network congestion is avoided. The principle of the control scheme is first illustrated by max–min flow control, which is realised by scheduling the virtual queues in round-robin. Further application of the control scheme is demonstrated with the achievement of weighted fairness through weighted round robin scheduling. Simulation results show that with a simple computation, the proposed scheme achieves the desired fairness exactly and controls network congestion effectively.