19 resultados para hashing
em Queensland University of Technology - ePrints Archive
Resumo:
Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.
Resumo:
In this article, we study the security of the IDEA block cipher when it is used in various simple-length or double-length hashing modes. Even though this cipher is still considered as secure, we show that one should avoid its use as internal primitive for block cipher based hashing. In particular, we are able to generate instantaneously free-start collisions for most modes, and even semi-free-start collisions, pseudo-preimages or hash collisions in practical complexity. This work shows a practical example of the gap that exists between secret-key and known or chosen-key security for block ciphers. Moreover, we also settle the 20-year-old standing open question concerning the security of the Abreast-DM and Tandem-DM double-length compression functions, originally invented to be instantiated with IDEA. Our attacks have been verified experimentally and work even for strengthened versions of IDEA with any number of rounds.
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.
Resumo:
Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.
Resumo:
Robust hashing is an emerging field that can be used to hash certain data types in applications unsuitable for traditional cryptographic hashing methods. Traditional hashing functions have been used extensively for data/message integrity, data/message authentication, efficient file identification and password verification. These applications are possible because the hashing process is compressive, allowing for efficient comparisons in the hash domain but non-invertible meaning hashes can be used without revealing the original data. These techniques were developed with deterministic (non-changing) inputs such as files and passwords. For such data types a 1-bit or one character change can be significant, as a result the hashing process is sensitive to any change in the input. Unfortunately, there are certain applications where input data are not perfectly deterministic and minor changes cannot be avoided. Digital images and biometric features are two types of data where such changes exist but do not alter the meaning or appearance of the input. For such data types cryptographic hash functions cannot be usefully applied. In light of this, robust hashing has been developed as an alternative to cryptographic hashing and is designed to be robust to minor changes in the input. Although similar in name, robust hashing is fundamentally different from cryptographic hashing. Current robust hashing techniques are not based on cryptographic methods, but instead on pattern recognition techniques. Modern robust hashing algorithms consist of feature extraction followed by a randomization stage that introduces non-invertibility and compression, followed by quantization and binary encoding to produce a binary hash output. In order to preserve robustness of the extracted features, most randomization methods are linear and this is detrimental to the security aspects required of hash functions. Furthermore, the quantization and encoding stages used to binarize real-valued features requires the learning of appropriate quantization thresholds. How these thresholds are learnt has an important effect on hashing accuracy and the mere presence of such thresholds are a source of information leakage that can reduce hashing security. This dissertation outlines a systematic investigation of the quantization and encoding stages of robust hash functions. While existing literature has focused on the importance of quantization scheme, this research is the first to emphasise the importance of the quantizer training on both hashing accuracy and hashing security. The quantizer training process is presented in a statistical framework which allows a theoretical analysis of the effects of quantizer training on hashing performance. This is experimentally verified using a number of baseline robust image hashing algorithms over a large database of real world images. This dissertation also proposes a new randomization method for robust image hashing based on Higher Order Spectra (HOS) and Radon projections. The method is non-linear and this is an essential requirement for non-invertibility. The method is also designed to produce features more suited for quantization and encoding. The system can operate without the need for quantizer training, is more easily encoded and displays improved hashing performance when compared to existing robust image hashing algorithms. The dissertation also shows how the HOS method can be adapted to work with biometric features obtained from 2D and 3D face images.
Resumo:
The invention relates to a method for monitoring user activity on a mobile device, comprising an input and an output unit, comprising the following steps preferably in the following order: detecting and / or logging user activity on said input unit, identifying a foreground running application, hashing of a user-interface-element management list of the foreground running application, and creating a screenshot comprising items displayed on said input unit. The invention also relates to a method for analyzing user activity at a server, comprising the following step: obtaining at least one of an information about detected and / or logged user activity, an information about a foreground running application, a hashed user-interface-element management list and a screenshot from a mobile device. Further, a computer program product is provided, comprising one or more computer readable media having computer executable instructions for performing the steps of at least one of the aforementioned methods.
Resumo:
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
Resumo:
RC4-Based Hash Function is a new proposed hash function based on RC4 stream cipher for ultra low power devices. In this paper, we analyse the security of the function against collision attack. It is shown that the attacker can find collision and multi-collision messages with complexity only 6 compress function operations and negligible memory with time complexity 2 13. In addition, we show the hashing algorithm can be distinguishable from a truly random sequence with probability close to one.
Resumo:
In this chapter, we discuss four related areas of cryptology, namely, authentication, hashing, message authentication codes (MACs), and digital signatures. These topics represent active and growing research topics in cryptology. Space limitations allow us to concentrate only on the essential aspects of each topic. The bibliography is intended to supplement our survey. We have selected those items which providean overview of the current state of knowledge in the above areas. Authentication deals with the problem of providing assurance to a receiver that a communicated message originates from a particular transmitter, and that the received message has the same content as the transmitted message. A typical authentication scenario occurs in computer networks, where the identity of two communicating entities is established by means of authentication. Hashing is concerned with the problem of providing a relatively short digest–fingerprint of a much longer message or electronic document. A hashing function must satisfy (at least) the critical requirement that the fingerprints of two distinct messages are distinct. Hashing functions have numerous applications in cryptology. They are often used as primitives to construct other cryptographic functions. MACs are symmetric key primitives that provide message integrity against active spoofing by appending a cryptographic checksum to a message that is verifiable only by the intended recipient of the message. Message authentication is one of the most important ways of ensuring the integrity of information that is transferred by electronic means. Digital signatures provide electronic equivalents of handwritten signatures. They preserve the essential features of handwritten signatures and can be used to sign electronic documents. Digital signatures can potentially be used in legal contexts.
Resumo:
Modular arithmetic has often been regarded as something of a mathematical curiosity, at least by those unfamiliar with its importance to both abstract algebra and number theory, and with its numerous applications. However, with the ubiquity of fast digital computers, and the need for reliable digital security systems such as RSA, this important branch of mathematics is now considered essential knowledge for many professionals. Indeed, computer arithmetic itself is, ipso facto, modular. This chapter describes how the modern graphical spreadsheet may be used to clearly illustrate the basics of modular arithmetic, and to solve certain classes of problems. Students may then gain structural insight and the foundations laid for applications to such areas as hashing, random number generation, and public-key cryptography.
Resumo:
Universal One-Way Hash Functions (UOWHFs) may be used in place of collision-resistant functions in many public-key cryptographic applications. At Asiacrypt 2004, Hong, Preneel and Lee introduced the stronger security notion of higher order UOWHFs to allow construction of long-input UOWHFs using the Merkle-Damgård domain extender. However, they did not provide any provably secure constructions for higher order UOWHFs. We show that the subset sum hash function is a kth order Universal One-Way Hash Function (hashing n bits to m < n bits) under the Subset Sum assumption for k = O(log m). Therefore we strengthen a previous result of Impagliazzo and Naor, who showed that the subset sum hash function is a UOWHF under the Subset Sum assumption. We believe our result is of theoretical interest; as far as we are aware, it is the first example of a natural and computationally efficient UOWHF which is also a provably secure higher order UOWHF under the same well-known cryptographic assumption, whereas this assumption does not seem sufficient to prove its collision-resistance. A consequence of our result is that one can apply the Merkle-Damgård extender to the subset sum compression function with ‘extension factor’ k+1, while losing (at most) about k bits of UOWHF security relative to the UOWHF security of the compression function. The method also leads to a saving of up to m log(k+1) bits in key length relative to the Shoup XOR-Mask domain extender applied to the subset sum compression function.
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
At CRYPTO 2006, Halevi and Krawczyk proposed two randomized hash function modes and analyzed the security of digital signature algorithms based on these constructions. They showed that the security of signature schemes based on the two randomized hash function modes relies on properties similar to the second preimage resistance rather than on the collision resistance property of the hash functions. One of the randomized hash function modes was named the RMX hash function mode and was recommended for practical purposes. The National Institute of Standards and Technology (NIST), USA standardized a variant of the RMX hash function mode and published this standard in the Special Publication (SP) 800-106. In this article, we first discuss a generic online birthday existential forgery attack of Dang and Perlner on the RMX-hash-then-sign schemes. We show that a variant of this attack can be applied to forge the other randomize-hash-then-sign schemes. We point out practical limitations of the generic forgery attack on the RMX-hash-then-sign schemes. We then show that these limitations can be overcome for the RMX-hash-then-sign schemes if it is easy to find fixed points for the underlying compression functions, such as for the Davies-Meyer construction used in the popular hash functions such as MD5 designed by Rivest and the SHA family of hash functions designed by the National Security Agency (NSA), USA and published by NIST in the Federal Information Processing Standards (FIPS). We show an online birthday forgery attack on this class of signatures by using a variant of Dean’s method of finding fixed point expandable messages for hash functions based on the Davies-Meyer construction. This forgery attack is also applicable to signature schemes based on the variant of RMX standardized by NIST in SP 800-106. We discuss some important applications of our attacks and discuss their applicability on signature schemes based on hash functions with ‘built-in’ randomization. Finally, we compare our attacks on randomize-hash-then-sign schemes with the generic forgery attacks on the standard hash-based message authentication code (HMAC).