Biblioteca Digital

929 resultados para regular expressions

Describing syntax with star-free regular expressions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Koskenniemen Äärellistilaisen leikkauskieliopin (FSIG) lauseopilliset rajoitteet ovat loogisesti vähemmän kompleksisia kuin mihin niissä käytetty formalismi vittaisi. Osoittautuukin että vaikka Voutilaisen (1994) englannin kielelle laatima FSIG-kuvaus käyttää useita säännöllisten lausekkeiden laajennuksia, kieliopin kuvaus kokonaisuutenaan palautuu äärelliseen yhdistelmään unionia, komplementtia ja peräkkäinasettelua. Tämä on oleellinen parannus ENGFSIG:n descriptiiviseen kompleksisuuteen. Tulos avaa ovia FSIG-kuvauksen loogisten ominaisuuksien syvemmälle analyysille ja FSIG kuvausten mahdolliselle optimoinnillle. Todistus sisältää uuden kaavan, joka kääntää Koskenniemien rajoiteoperaation ilman markkerimerkkejä.

From regular expressions to deterministic automata

Relevância:

100.00% 100.00%

Publicador:

Improving Recall of Regular Expressions for Information Extraction

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning or writing regular expressions to identify instances of a specific
concept within text documents with a high precision and recall is challenging.
It is relatively easy to improve the precision of an initial regular expression
by identifying false positives covered and tweaking the expression to avoid the
false positives. However, modifying the expression to improve recall is difficult
since false negatives can only be identified by manually analyzing all documents,
in the absence of any tools to identify the missing instances. We focus on partially
automating the discovery of missing instances by soliciting minimal user
feedback. We present a technique to identify good generalizations of a regular
expression that have improved recall while retaining high precision. We empirically
demonstrate the effectiveness of the proposed technique as compared to
existing methods and show results for a variety of tasks such as identification of
dates, phone numbers, product names, and course numbers on real world datasets

Object oriented regular expressions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Regular expressions are used to parse textual data to match patterns and extract variables. They have been implemented in a vast number of programming languages with a significant quantity of research devoted to improving their operational efficiency. However, regular expressions are limited to finding linear matches. Little research has been done in the field of object-oriented results which would allow textual or binary data to be converted to multi-layered objects. This is significantly relevant as many of todaypsilas data formats are object-based. This paper extends our previous work by detailing an algorithmic approach to perform object-oriented parsing, and provides an initial study of benchmarks of the algorithms of our contribution

Regular Expressions in Pharo

Relevância:

100.00% 100.00%

Publicador:

Dynamically Reconfigurable Regular Expression Matching Architecture

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Nondeterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further,we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.

Efficient Regular Expression Pattern Matching for Network Intrusion Detection Systems using Modified Word-based Automata

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Network Intrusion Detection Systems (NIDS) intercept the traffic at an organization's network periphery to thwart intrusion attempts. Signature-based NIDS compares the intercepted packets against its database of known vulnerabilities and malware signatures to detect such cyber attacks. These signatures are represented using Regular Expressions (REs) and strings. Regular Expressions, because of their higher expressive power, are preferred over simple strings to write these signatures. We present Cascaded Automata Architecture to perform memory efficient Regular Expression pattern matching using existing string matching solutions. The proposed architecture performs two stage Regular Expression pattern matching. We replace the substring and character class components of the Regular Expression with new symbols. We address the challenges involved in this approach. We augment the Word-based Automata, obtained from the re-written Regular Expressions, with counter-based states and length bound transitions to perform Regular Expression pattern matching. We evaluated our architecture on Regular Expressions taken from Snort rulesets. We were able to reduce the number of automata states between 50% to 85%. Additionally, we could reduce the number of transitions by a factor of 3 leading to further reduction in the memory requirements.

NFA decomposition and multiprocessing architecture for parallel regular expression processing

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This work presents a novel algorithm for decomposing NFA automata into one-state-active modules for parallel execution on Multiprocessor Systems on Chip (MP-SoC). Furthermore, performance related studies based on a 16-PE system for Snort, Bro and Linux-L7 regular expressions are presented. ©2009 IEEE.

Efficient FPGA-based regular expression pattern matching

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An approach to the automatic generation of efficient Field Programmable Gate Arrays (FPGAs) circuits for the Regular Expression-based (RegEx) Pattern Matching problems is presented. Using a novel design strategy, as proposed, circuits that are highly area-and-time-efficient can be automatically generated for arbitrary sets of regular expressions. This makes the technique suitable for applications that must handle very large sets of patterns at high speed, such as in the network security and intrusion detection application domains. We have combined several existing techniques to optimise our solution for such domains and proposed the way the whole process of dynamic generation of FPGAs for RegEX pattern matching could be automated efficiently.

SSMBS: a web server to locate sequentially separated motifs in biological sequences

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server, SSMBS, which can locate anddisplay the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.

A Method to Find Sequentially Separated Motifs in Biological Sequences (SSMBS)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequence motifs occurring in a particular order in proteins or DNA have been proved to be of biological interest. In this paper, a new method to locate the occurrences of up to five user-defined motifs in a specified order in large proteins and in nucleotide sequence databases is proposed. It has been designed using the concept of quantifiers in regular expressions and linked lists for data storage. The application of this method includes the extraction of relevant consensus regions from biological sequences. This might be useful in clustering of protein families as well as to study the correlation between positions of motifs and their functional sites in DNA sequences.

An Efficient Constraint Grammar Parser based on Inward Deterministic Automata

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.

Constraint-based Mining of Frequent Arrangements of Temporal Intervals

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Two efficient methods to find frequent arrangements of temporal intervals are described; the first one is tree-based and uses depth first search to mine the set of frequent arrangements, whereas the second one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints consisting of regular expressions and gap constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.

Genetic Algorithm Search for Predictive Patterns in Multidimensional Time Series

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Based on an algorithm for pattern matching in character strings, we implement a pattern matching machine that searches for occurrences of patterns in multidimensional time series. Before the search process takes place, time series are encoded in user-designed alphabets. The patterns, on the other hand, are formulated as regular expressions that are composed of letters from these alphabets and operators. Furthermore, we develop a genetic algorithm to breed patterns that maximize a user-defined fitness function. In an application to financial data, we show that patterns bred to predict high exchange rates volatility in training samples retain statistically significant predictive power in validation samples.

Deciding Kleene Algebra Terms Equivalence in Coq

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a mechanically verified implementation of an algorithm for deciding the equivalence of Kleene algebra terms within the Coq proof assistant. The algorithm decides equivalence of two given regular expressions through an iterated process of testing the equivalence of their partial derivatives and does not require the construction of the corresponding automata. Recent theoretical and experimental research provides evidence that this method is, on average, more efficient than the classical methods based on automata. We present some performance tests, comparisons with similar approaches, and also introduce a generalization of the algorithm to decide the equivalence of terms of Kleene algebra with tests. The motivation for the work presented in this paper is that of using the libraries developed as trusted frameworks for carrying out certified program verification.

«
1
2
3
4
5
6
7
8
...
61
62
»