5 resultados para MDL
em Helda - Digital Repository of University of Helsinki
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.
Resumo:
Yksinkertaisuus on vahva induktiivisen päättelyn periaate. Se on läsnä monessa arkielämän tilanteessa epäformaalina peukalosääntönä, jonka mukaan yksinkertaisin selitys on paras. Yksinkertaisuuden periaatetta, eli Okkamin partaveistä, voidaan soveltaa myös tilastollisen päättelyn pohjana. Sen formaali versio, niin sanottu lyhimmän kuvauspituuden periaate (MDL-periaate), asettaa vaihtoehtoiset hypoteesit paremmuusjärjestykseen sen mukaan, mikä niistä mahdollistaa aineiston lyhimmän kuvauksen, kun kuvaus sisältää myös itse hypoteesin. Kuvauspituuden määrittämiseksi sovelletaan informaatioteorian ja tiedon tiivistämisen menetelmiä. Esitän tässä kirjoituksessa joitakin informaatioteorian käsitteitä. Kirjoituksen jälkipuoliskolla käydään läpi MDL-periaatteen alkeita.