Fast content-based file type identification


Autoria(s): Ahmed, Irfan; Lhee, Kyung-Suk; Shin, Hyun-Jung; Hong, Man-Pyo
Contribuinte(s)

Sujeet, Shenoi

Peterson, Bert

Data(s)

08/08/2011

Resumo

Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/41535/

Relação

http://eprints.qut.edu.au/41535/1/41535.pdf

http://www.ifiptc11.org/index.php?id=411&no_cache=1

Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida.

Direitos

Copyright 2011 Springer

This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com

Fonte

Information Security Institute

Palavras-Chave #080303 Computer System Security #File type identification #File content classification #Byte frequency
Tipo

Conference Paper