Exact Probability Distribution versus Entropy


Autoria(s): Andersson, Kerstin
Data(s)

2014

Resumo

The problem addressed concerns the determination of the average numberof successive attempts of guessing a word of a certain length consisting of letters withgiven probabilities of occurrence. Both first- and second-order approximations to a naturallanguage are considered. The guessing strategy used is guessing words in decreasing orderof probability. When word and alphabet sizes are large, approximations are necessary inorder to estimate the number of guesses. Several kinds of approximations are discusseddemonstrating moderate requirements regarding both memory and central processing unit(CPU) time. When considering realistic sizes of alphabets and words (100), the numberof guesses can be estimated within minutes with reasonable accuracy (a few percent) andmay therefore constitute an alternative to, e.g., various entropy expressions. For manyprobability distributions, the density of the logarithm of probability products is close to anormal distribution. For those cases, it is possible to derive an analytical expression for theaverage number of guesses. The proportion of guesses needed on average compared to thetotal number decreases almost exponentially with the word length. The leading term in anasymptotic expansion can be used to estimate the number of guesses for large word lengths.Comparisons with analytical lower bounds and entropy expressions are also provided.

Formato

application/pdf

Identificador

http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-34247

doi:10.3390/e16105198

ISI:000344459500003

Idioma(s)

eng

Publicador

Karlstads universitet, Institutionen för matematik och datavetenskap

Basel, Switzerland : MDPI AG

Relação

Entropy, 1099-4300, 2014, 16:10, s. 5198-5210

Direitos

info:eu-repo/semantics/openAccess

Palavras-Chave #information entropy #security #guessing
Tipo

Article in journal

info:eu-repo/semantics/article

text