92 resultados para Pattern matching
em Indian Institute of Science - Bangalore - Índia
Resumo:
Network Intrusion Detection Systems (NIDS) intercept the traffic at an organization's network periphery to thwart intrusion attempts. Signature-based NIDS compares the intercepted packets against its database of known vulnerabilities and malware signatures to detect such cyber attacks. These signatures are represented using Regular Expressions (REs) and strings. Regular Expressions, because of their higher expressive power, are preferred over simple strings to write these signatures. We present Cascaded Automata Architecture to perform memory efficient Regular Expression pattern matching using existing string matching solutions. The proposed architecture performs two stage Regular Expression pattern matching. We replace the substring and character class components of the Regular Expression with new symbols. We address the challenges involved in this approach. We augment the Word-based Automata, obtained from the re-written Regular Expressions, with counter-based states and length bound transitions to perform Regular Expression pattern matching. We evaluated our architecture on Regular Expressions taken from Snort rulesets. We were able to reduce the number of automata states between 50% to 85%. Additionally, we could reduce the number of transitions by a factor of 3 leading to further reduction in the memory requirements.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.
Resumo:
Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.
Resumo:
Background: The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better. Results: Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions. Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family. Conclusions: CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Resumo:
Template matching is concerned with measuring the similarity between patterns of two objects. This paper proposes a memory-based reasoning approach for pattern recognition of binary images with a large template set. It seems that memory-based reasoning intrinsically requires a large database. Moreover, some binary image recognition problems inherently need large template sets, such as the recognition of Chinese characters which needs thousands of templates. The proposed algorithm is based on the Connection Machine, which is the most massively parallel machine to date, using a multiresolution method to search for the matching template. The approach uses the pyramid data structure for the multiresolution representation of templates and the input image pattern. For a given binary image it scans the template pyramid searching the match. A binary image of N × N pixels can be matched in O(log N) time complexity by our algorithm and is independent of the number of templates. Implementation of the proposed scheme is described in detail.
Resumo:
Cross domain and cross-modal matching has many applications in the field of computer vision and pattern recognition. A few examples are heterogeneous face recognition, cross view action recognition, etc. This is a very challenging task since the data in two domains can differ significantly. In this work, we propose a coupled dictionary and transformation learning approach that models the relationship between the data in both domains. The approach learns a pair of transformation matrices that map the data in the two domains in such a manner that they share common sparse representations with respect to their own dictionaries in the transformed space. The dictionaries for the two domains are learnt in a coupled manner with an additional discriminative term to ensure improved recognition performance. The dictionaries and the transformation matrices are jointly updated in an iterative manner. The applicability of the proposed approach is illustrated by evaluating its performance on different challenging tasks: face recognition across pose, illumination and resolution, heterogeneous face recognition and cross view action recognition. Extensive experiments on five datasets namely, CMU-PIE, Multi-PIE, ChokePoint, HFB and IXMAS datasets and comparisons with several state-of-the-art approaches show the effectiveness of the proposed approach. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Cross domain and cross-modal matching has many applications in the field of computer vision and pattern recognition. A few examples are heterogeneous face recognition, cross view action recognition, etc. This is a very challenging task since the data in two domains can differ significantly. In this work, we propose a coupled dictionary and transformation learning approach that models the relationship between the data in both domains. The approach learns a pair of transformation matrices that map the data in the two domains in such a manner that they share common sparse representations with respect to their own dictionaries in the transformed space. The dictionaries for the two domains are learnt in a coupled manner with an additional discriminative term to ensure improved recognition performance. The dictionaries and the transformation matrices are jointly updated in an iterative manner. The applicability of the proposed approach is illustrated by evaluating its performance on different challenging tasks: face recognition across pose, illumination and resolution, heterogeneous face recognition and cross view action recognition. Extensive experiments on five datasets namely, CMU-PIE, Multi-PIE, ChokePoint, HFB and IXMAS datasets and comparisons with several state-of-the-art approaches show the effectiveness of the proposed approach. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
The mode of action of xylanase and beta-glucosidase purified from the culture filtrate of Humicola lanuginosa (Griffon and Maublanc) Bunce on the xylan extracted from sugarcane bagasse and on two commercially available larchwood and oat spelt xylans, on xylooligomers and on arabinoxylooligomers was studied. While larchwood and oat spelt xylans were hydrolyzed to the same extent in 24 h, sugarcane bagasse xylan was hydrolyzed to a lesser extent in the same period. It was found that the rate of hydrolysis of xylooligomers by xylanase increased with increase in chain length, while beta-glucosidase acted rather slowly on all the oligomers tested. Xylanase exhibited predominant ''endo'' action on xylooligomers attacking the xylan chain at random while beta-glucosidase had ''exo'' action, releasing one xylose residue at a time. On arabinoxylooligomers, however, xylanase exhibited ''exo'' action. Thus, it appears that the presence of the arabinose substituent has, in some way, rendered the terminal xylose-xylose linkage more susceptible to xylanase action. It was also observed that even after extensive hydrolysis with both the enzymes, substantial amounts of the parent arabinoxylooligomer remained unhydrolyzed together with the accumulation of arabinoxylobiose. It can therefore be concluded that the presence of the arabinose substituent in the xylan chain results in linkages that offer resistance to both xylanase and beta-glucosidase action.
Resumo:
Abstract-The success of automatic speaker recognition in laboratory environments suggests applications in forensic science for establishing the Identity of individuals on the basis of features extracted from speech. A theoretical model for such a verification scheme for continuous normaliy distributed featureIss developed. The three cases of using a) single feature, b)multipliendependent measurements of a single feature, and c)multpleindependent features are explored.The number iofndependent features needed for areliable personal identification is computed based on the theoretcal model and an expklatory study of some speech featues.
Resumo:
A simple sequential thinning algorithm for peeling off pixels along contours is described. An adaptive algorithm obtained by incorporating shape adaptivity into this sequential process is also given. The distortions in the skeleton at the right-angle and acute-angle corners are minimized in the adaptive algorithm. The asymmetry of the skeleton, which is a characteristic of sequential algorithm, and is due to the presence of T-corners in some of the even-thickness pattern is eliminated. The performance (in terms of time requirements and shape preservation) is compared with that of a modern thinning algorithm.
Resumo:
An adaptive learning scheme, based on a fuzzy approximation to the gradient descent method for training a pattern classifier using unlabeled samples, is described. The objective function defined for the fuzzy ISODATA clustering procedure is used as the loss function for computing the gradient. Learning is based on simultaneous fuzzy decisionmaking and estimation. It uses conditional fuzzy measures on unlabeled samples. An exponential membership function is assumed for each class, and the parameters constituting these membership functions are estimated, using the gradient, in a recursive fashion. The induced possibility of occurrence of each class is useful for estimation and is computed using 1) the membership of the new sample in that class and 2) the previously computed average possibility of occurrence of the same class. An inductive entropy measure is defined in terms of induced possibility distribution to measure the extent of learning. The method is illustrated with relevant examples.
Resumo:
The minimum cost classifier when general cost functionsare associated with the tasks of feature measurement and classification is formulated as a decision graph which does not reject class labels at intermediate stages. Noting its complexities, a heuristic procedure to simplify this scheme to a binary decision tree is presented. The optimizationof the binary tree in this context is carried out using ynamicprogramming. This technique is applied to the voiced-unvoiced-silence classification in speech processing.
Resumo:
The statistical minimum risk pattern recognition problem, when the classification costs are random variables of unknown statistics, is considered. Using medical diagnosis as a possible application, the problem of learning the optimal decision scheme is studied for a two-class twoaction case, as a first step. This reduces to the problem of learning the optimum threshold (for taking appropriate action) on the a posteriori probability of one class. A recursive procedure for updating an estimate of the threshold is proposed. The estimation procedure does not require the knowledge of actual class labels of the sample patterns in the design set. The adaptive scheme of using the present threshold estimate for taking action on the next sample is shown to converge, in probability, to the optimum. The results of a computer simulation study of three learning schemes demonstrate the theoretically predictable salient features of the adaptive scheme.
Resumo:
We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.
Resumo:
AlI3 is an easily accessible and versatile ether-cleaving reagent.