Biblioteca Digital

Fast content-based file type identification

**Autoria(s):** Ahmed, Irfan; Lhee, Kyung-Suk; Shin, Hyun-Jung; Hong, Man-Pyo
Contribuinte(s)	Sujeet, Shenoi Peterson, Bert
Data(s)	08/08/2011
Resumo	Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.
Formato	application/pdf
Identificador	http://eprints.qut.edu.au/41535/
Relação	http://eprints.qut.edu.au/41535/1/41535.pdf http://www.ifiptc11.org/index.php?id=411&no_cache=1 Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida.
Direitos	Copyright 2011 Springer This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com
Fonte	Information Security Institute
Palavras-Chave	#080303 Computer System Security #File type identification #File content classification #Byte frequency
Tipo	Conference Paper

Acesso ao item digital