4 resultados para Hypothesis testing

em Massachusetts Institute of Technology


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we bound the generalization error of a class of Radial Basis Function networks, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of its finite size) and partly due to insufficient information about the target function (because of finite number of samples). We make several observations about generalization error which are valid irrespective of the approximation scheme. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis describes some aspects of a computer system for doing medical diagnosis in the specialized field of kidney disease. Because such a system faces the spectre of combinatorial explosion, this discussion concentrates on heuristics which control the number of concurrent hypotheses and efficient "compiled" representations of medical knowledge. In particular, the differential diagnosis of hematuria (blood in the urine) is discussed in detail. A protocol of a simulated doctor/patient interaction is presented and analyzed to determine the crucial structures and processes involved in the diagnosis procedure. The data structure proposed for representing medical information revolves around elementary hypotheses which are activated when certain disposing of findings, activating hypotheses, evaluating hypotheses locally and combining hypotheses globally is examined for its heuristic implications. The thesis attempts to fit the problem of medical diagnosis into the framework of other Artifcial Intelligence problems and paradigms and in particular explores the notions of pure search vs. heuristic methods, linearity and interaction, local vs. global knowledge and the structure of hypotheses within the world of kidney disease.