Fast content-based file type identification
Contribuinte(s) |
Sujeet, Shenoi Peterson, Bert |
---|---|
Data(s) |
08/08/2011
|
Resumo |
Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy. |
Formato |
application/pdf |
Identificador | |
Relação |
http://eprints.qut.edu.au/41535/1/41535.pdf http://www.ifiptc11.org/index.php?id=411&no_cache=1 Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida. |
Direitos |
Copyright 2011 Springer This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com |
Fonte |
Information Security Institute |
Palavras-Chave | #080303 Computer System Security #File type identification #File content classification #Byte frequency |
Tipo |
Conference Paper |