10 resultados para Database application
em Aston University Research Archive
Resumo:
This thesis describes the development of a complete data visualisation system for large tabular databases, such as those commonly found in a business environment. A state-of-the-art 'cyberspace cell' data visualisation technique was investigated and a powerful visualisation system using it was implemented. Although allowing databases to be explored and conclusions drawn, it had several drawbacks, the majority of which were due to the three-dimensional nature of the visualisation. A novel two-dimensional generic visualisation system, known as MADEN, was then developed and implemented, based upon a 2-D matrix of 'density plots'. MADEN allows an entire high-dimensional database to be visualised in one window, while permitting close analysis in 'enlargement' windows. Selections of records can be made and examined, and dependencies between fields can be investigated in detail. MADEN was used as a tool for investigating and assessing many data processing algorithms, firstly data-reducing (clustering) methods, then dimensionality-reducing techniques. These included a new 'directed' form of principal components analysis, several novel applications of artificial neural networks, and discriminant analysis techniques which illustrated how groups within a database can be separated. To illustrate the power of the system, MADEN was used to explore customer databases from two financial institutions, resulting in a number of discoveries which would be of interest to a marketing manager. Finally, the database of results from the 1992 UK Research Assessment Exercise was analysed. Using MADEN allowed both universities and disciplines to be graphically compared, and supplied some startling revelations, including empirical evidence of the 'Oxbridge factor'.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
The continuing threat of infectious disease and future pandemics, coupled to the continuous increase of drug-resistant pathogens, makes the discovery of new and better vaccines imperative. For effective vaccine development, antigen discovery and validation is a prerequisite. The compilation of information concerning pathogens, virulence factors and antigenic epitopes has resulted in many useful databases. However, most such immunological databases focus almost exclusively on antigens where epitopes are known and ignore those for which epitope information was unavailable. We have compiled more than 500 antigens into the AntigenDB database, making use of the literature and other immunological resources. These antigens come from 44 important pathogenic species. In AntigenDB, a database entry contains information regarding the sequence, structure, origin, etc. of an antigen with additional information such as B and T-cell epitopes, MHC binding, function, gene-expression and post translational modifications, where available. AntigenDB also provides links to major internal and external databases. We shall update AntigenDB on a rolling basis, regularly adding antigens from other organisms and extra data analysis tools. AntigenDB is available freely at http://www.imtech.res.in/raghava/antigendb and its mirror site http://www.bic.uams.edu/raghava/antigendb.
Resumo:
Computer-Based Learning systems of one sort or another have been in existence for almost 20 years, but they have yet to achieve real credibility within Commerce, Industry or Education. A variety of reasons could be postulated for this, typically: - cost - complexity - inefficiency - inflexibility - tedium Obviously different systems deserve different levels and types of criticism, but it still remains true that Computer-Based Learning (CBL) is falling significantly short of its potential. Experience of a small, but highly successful CBL system within a large, geographically distributed industry (the National Coal Board) prompted an investigation into currently available packages, the original intention being to purchase the most suitable software and run it on existing computer hardware, alongside existing software systems. It became apparent that none of the available CBL packages were suitable, and a decision was taken to develop an in-house Computer-Assisted Instruction system according to the following criteria: - cheap to run; - easy to author course material; - easy to use; - requires no computing knowledge to use (as either an author or student) ; - efficient in the use of computer resources; - has a comprehensive range of facilities at all levels. This thesis describes the initial investigation, resultant observations and the design, development and implementation of the SCHOOL system. One of the principal characteristics c£ SCHOOL is that it uses a hierarchical database structure for the storage of course material - thereby providing inherently a great deal of the power, flexibility and efficiency originally required. Trials using the SCHOOL system on IBM 303X series equipment are also detailed, along with proposed and current development work on what is essentially an operational CBL system within a large-scale Industrial environment.
Resumo:
Database systems have a user interface one of the components of which will normally be a query language which is based on a particular data model. Typically data models provide primitives to define, manipulate and query databases. Often these primitives are designed to form self-contained query languages. This thesis describes a prototype implementation of a system which allows users to specify queries against the database in a query language whose primitives are not those provided by the actual model on which the database system is based, but those provided by a different data model. The implementation chosen is the Functional Query Language Front End (FQLFE). This uses the Daplex functional data model and query language. Using FQLFE, users can specify the underlying database (based on the relational model) in terms of Daplex. Queries against this specified view can then be made in Daplex. FQLFE transforms these queries into the query language (Quel) of the underlying target database system (Ingres). The automation of part of the Daplex function definition phase is also described and its implementation discussed.
Resumo:
An implementation of a Lexical Functional Grammar (LFG) natural language front-end to a database is presented, and its capabilities demonstrated by reference to a set of queries used in the Chat-80 system. The potential of LFG for such applications is explored. Other grammars previously used for this purpose are briefly reviewed and contrasted with LFG. The basic LFG formalism is fully described, both as to its syntax and semantics, and the deficiencies of the latter for database access application shown. Other current LFG implementations are reviewed and contrasted with the LFG implementation developed here specifically for database access. The implementation described here allows a natural language interface to a specific Prolog database to be produced from a set of grammar rule and lexical specifications in an LFG-like notation. In addition to this the interface system uses a simple database description to compile metadata about the database for later use in planning the execution of queries. Extensions to LFG's semantic component are shown to be necessary to produce a satisfactory functional analysis and semantic output for querying a database. A diverse set of natural language constructs are analysed using LFG and the derivation of Prolog queries from the F-structure output of LFG is illustrated. The functional description produced from LFG is proposed as sufficient for resolving many problems of quantification and attachment.
Resumo:
This thesis presents a new approach to designing large organizational databases. The approach emphasizes the need for a holistic approach to the design process. The development of the proposed approach was based on a comprehensive examination of the issues of relevance to the design and utilization of databases. Such issues include conceptual modelling, organization theory, and semantic theory. The conceptual modelling approach presented in this thesis is developed over three design stages, or model perspectives. In the semantic perspective, concept definitions were developed based on established semantic principles. Such definitions rely on meaning - provided by intension and extension - to determine intrinsic conceptual definitions. A tool, called meaning-based classification (MBC), is devised to classify concepts based on meaning. Concept classes are then integrated using concept definitions and a set of semantic relations which rely on concept content and form. In the application perspective, relationships are semantically defined according to the application environment. Relationship definitions include explicit relationship properties and constraints. The organization perspective introduces a new set of relations specifically developed to maintain conformity of conceptual abstractions with the nature of information abstractions implied by user requirements throughout the organization. Such relations are based on the stratification of work hierarchies, defined elsewhere in the thesis. Finally, an example of an application of the proposed approach is presented to illustrate the applicability and practicality of the modelling approach.
Resumo:
The Teallach project has adapted model-based user-interface development techniques to the systematic creation of user-interfaces for object-oriented database applications. Model-based approaches aim to provide designers with a more principled approach to user-interface development using a variety of underlying models, and tools which manipulate these models. Here we present the results of the Teallach project, describing the tools developed and the flexible design method supported. Distinctive features of the Teallach system include provision of database-specific constructs, comprehensive facilities for relating the different models, and support for a flexible design method in which models can be constructed and related by designers in different orders and in different ways, to suit their particular design rationales. The system then creates the desired user-interface as an independent, fully functional Java application, with automatically generated help facilities.
Resumo:
A protein's isoelectric point or pI corresponds to the solution pH at which its net surface charge is zero. Since the early days of solution biochemistry, the pI has been recorded and reported, and thus literature reports of pI abound. The Protein Isoelectric Point database (PIP-DB) has collected and collated these data to provide an increasingly comprehensive database for comparison and benchmarking purposes. A web application has been developed to warehouse this database and provide public access to this unique resource. PIP-DB is a web-enabled SQL database with an HTML GUI front-end. PIP-DB is fully searchable across a range of properties.
Resumo:
The Protein pKa Database (PPD) v1.0 provides a compendium of protein residue-specific ionization equilibria (pKa values), as collated from the primary literature, in the form of a web-accessible postgreSQL relational database. Ionizable residues play key roles in the molecular mechanisms that underlie many biological phenomena, including protein folding and enzyme catalysis. The PPD serves as a general protein pKa archive and as a source of data that allows for the development and improvement of pKa prediction systems. The database is accessed through an HTML interface, which offers two fast, efficient search methods: an amino acid-based query and a Basic Local Alignment Search Tool search. Entries also give details of experimental techniques and links to other key databases, such as National Center for Biotechnology Information and the Protein Data Bank, providing the user with considerable background information.