36 resultados para Automated Cryptanalysis
em Aston University Research Archive
Resumo:
Ontologies have become widely accepted as the main method for representing knowledge in Knowledge Management (KM) applica-tions. Given the continuous and rapid change and dynamic nature of knowledge in all fields, automated methods for construct-ing ontologies are of great importance. All ontologies or taxonomies currently in use have been hand built and require consider-able manpower to keep up to date. Taxono-mies are less logically rigorous than ontolo-gies, and in this paper we consider the re-quirements for a system which automatically constructed taxonomies. There are a number of potentially useful methods for construct-ing hierarchically organised concepts from a collection of texts and there are a number of automatic methods which permit one to as-sociate one word with another. The impor-tant issue for the successful development of this research area is to identify techniques for labelling the relation between two candi-date terms, if one exists. We consider a number of possible approaches and argue that the majority are unsuitable for our re-quirements.
Resumo:
While the retrieval of existing designs to prevent unnecessary duplication of parts is a recognised strategy in the control of design costs the available techniques to achieve this, even in product data management systems, are limited in performance or require large resources. A novel system has been developed based on a new version of an existing coding system (CAMAC) that allows automatic coding of engineering drawings and their subsequent retrieval using a drawing of the desired component as the input. The ability to find designs using a detail drawing rather than textual descriptions is a significant achievement in itself. Previous testing of the system has demonstrated this capability but if a means could be found to find parts from a simple sketch then its practical application would be much more effective. This paper describes the development and testing of such a search capability using a database of over 3000 engineering components.
Resumo:
In designing new product the ability to retrieve drawings of existing components is important if costs are to be controlled by preventing unnecessary duplication if parts. Component coding and classification systems have been used successfully for these purposes but suffer from high operational costs and poor usability arising directly from the manual nature of the coding process itself. A new version of an existing coding system (CAMAC) has been developed to reduce costs by automatically coding engineering drawings. Usability is improved be supporting searches based on a drawing or sketch of the desired component. Test results from a database of several thousand drawings are presented.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
INTAMAP is a web processing service for the automatic interpolation of measured point data. Requirements were (i) using open standards for spatial data such as developed in the context of the open geospatial consortium (OGC), (ii) using a suitable environment for statistical modelling and computation, and (iii) producing an open source solution. The system couples the 52-North web processing service, accepting data in the form of an observations and measurements (O&M) document with a computing back-end realized in the R statistical environment. The probability distribution of interpolation errors is encoded with UncertML, a new markup language to encode uncertain data. Automatic interpolation needs to be useful for a wide range of applications and the algorithms have been designed to cope with anisotropies and extreme values. In the light of the INTAMAP experience, we discuss the lessons learnt.
Resumo:
The INTAMAP FP6 project has developed an interoperable framework for real-time automatic mapping of critical environmental variables by extending spatial statistical methods and employing open, web-based, data exchange protocols and visualisation tools. This paper will give an overview of the underlying problem, of the project, and discuss which problems it has solved and which open problems seem to be most relevant to deal with next. The interpolation problem that INTAMAP solves is the generic problem of spatial interpolation of environmental variables without user interaction, based on measurements of e.g. PM10, rainfall or gamma dose rate, at arbitrary locations or over a regular grid covering the area of interest. It deals with problems of varying spatial resolution of measurements, the interpolation of averages over larger areas, and with providing information on the interpolation error to the end-user. In addition, monitoring network optimisation is addressed in a non-automatic context.
Resumo:
Since the advent of High Level Programming languages (HLPLs) in the early 1950s researchers have sought ways to automate the construction of HLPL compilers. To this end a variety of Translator Writing Tools (TWTs) have been developed in the last three decades. However, only a very few of these tools have gained significant commercial acceptance. This thesis re-examines traditional compiler construction techniques, along with a number of previous TWTs, and proposes a new improved tool for automated compiler construction called the Aston Compiler Constructor (ACC). This new tool allows the specification of complete compilation systems using a high level compiler oriented specification notation called the Compiler Construction Language (CCL). This specification notation is based on a modern variant of Backus Naur Form (BNF) and an extended variant of Attribute Grammars (AGs). The implementation and processing of the CCL is discussed along with an extensive CCL example. The CCL is shown to have an extensive expressive power, to be convenient in use, and highly readable, and thus a superior alternative to earlier TWTs, and to traditional compiler construction techniques. The execution performance of CCL specifications is evaluated and shown to be acceptable. A number of related areas are also addressed, including tools for the rapid construction of individual compiler components, and tools for the construction of compilation systems for multiprocessor operating systems and hardware. This latter area is expected to become of particular interest in future years due to the anticipated increased use of multiprocessor architectures.
Resumo:
A series of N1-benzylideneheteroarylcarboxamidrazones was prepared in an automated fashion, and tested against Mycobacterium fortuitum in a rapid screen for antimycobacterial activity. Many of the compounds from this series were also tested against Mycobacterium tuberculosis, and the usefulness as M.fortuitum as a rapid, initial screen for anti-tubercular activity evaluated. Various deletions were made to the N1-benzylideneheteroarylcarboxamidrazone structure in order to establish the minimum structural requirements for activity. The N1-benzylideneheteroarylcarbox-amidrazones were then subjected to molecular modelling studies and their activities against M.fortuitum and M.tuberculosis were analysed using quantitative structure-analysis relationship (QSAR) techniques in the computational package TSAR (Oxford Molecular Ltd.). A set of equations predictive of antimycobacterial activity was hereby obtained. The series of N1-benzylidenehetero-arylcarboxamidrazones was also tested against a multidrug-resistant strain of Staphylococcus aureus (MRSA), followed by a panel of Gram-positive and Gram-negative bacteria, if activity was observed for MRSA. A set of antimycobacterial N1-benzylideneheteroarylcarboxamidrazones was hereby discovered, the best of which had MICs against m. fortuitum in the range 4-8μgml-1 and displayed 94% inhibition of M.tuberculosis at a concentration of 6.25μgml-1. The antimycobacterial activity of these compounds appeared to be specific, since the same compounds were shown to be inactive against other classes of organisms. Compounds which were found to be sufficiently active in any screen were also tested for their toxicity against human mononuclear leucocytes. Polyethylene glycol (PEG) was used as a soluble polymeric support for the synthesis of some fatty acid derivatives, containing an isoxazoline group, which may inhibit mycolic acid synthesis in mycobacteria. Both the PEG-bound products and the cleaved, isolated products themselves were tested against M.fortuitum and some low levels of antimycobacterial activity were observed, which may serve as lead compounds for further studies.