904 resultados para INFORMATION EXTRACTION FROM DOCUMENTS
Resumo:
Information systems are corporate resources, therefore information systems development must be aligned with corporate strategy. This thesis proposes that effective strategic alignment of information systems requires information systems development, information systems planning and strategic management to be united. Literature in these areas is examined, breaching the academic boundaries which separate these areas, to contribute a synthesised approach to the strategic alignment of information systems development. Previous work in information systems planning has extended information systems development techniques, such as data modelling, into strategic planning activities, neglecting techniques of strategic management. Examination of strategic management in this thesis, identifies parallel trends in strategic management and information systems development; the premises of the learning school of strategic management are similar to those of soft systems approaches to information systems development. It is therefore proposed that strategic management can be supported by a soft systems approach. Strategic management tools and techniques frame individual views of a strategic situation; soft systems approaches can integrate these diverse views to explore the internal and external environments of an organisation. The information derived from strategic analysis justifies the need for an information system and provides a starting point for information systems development. This is demonstrated by a composite framework which enables each information system to be justified according to its direct contribution to corporate strategy. The proposed framework was developed through action research conducted in a number of organisations of varying types. This suggests that the framework can be widely used to support the strategic alignment of information systems development, thereby contributing to organisational success.
Resumo:
SPOT simulation imagery was acquired for a test site in the Forest of Dean in Gloucestershire, U.K. This data was qualitatively and quantitatively evaluated for its potential application in forest resource mapping and management. A variety of techniques are described for enhancing the image with the aim of providing species level discrimination within the forest. Visual interpretation of the imagery was more successful than automated classification. The heterogeneity within the forest classes, and in particular between the forest and urban class, resulted in poor discrimination using traditional `per-pixel' automated methods of classification. Different means of assessing classification accuracy are proposed. Two techniques for measuring textural variation were investigated in an attempt to improve classification accuracy. The first of these, a sequential segmentation method, was found to be beneficial. The second, a parallel segmentation method, resulted in little improvement though this may be related to a combination of resolution in size of the texture extraction area. The effect on classification accuracy of combining the SPOT simulation imagery with other data types is investigated. A grid cell encoding technique was selected as most appropriate for storing digitised topographic (elevation, slope) and ground truth data. Topographic data were shown to improve species-level classification, though with sixteen classes overall accuracies were consistently below 50%. Neither sub-division into age groups or the incorporation of principal components and a band ratio significantly improved classification accuracy. It is concluded that SPOT imagery will not permit species level classification within forested areas as diverse as the Forest of Dean. The imagery will be most useful as part of a multi-stage sampling scheme. The use of texture analysis is highly recommended for extracting maximum information content from the data. Incorporation of the imagery into a GIS will both aid discrimination and provide a useful management tool.
Resumo:
The primary objective of this research was to understand what kinds of knowledge and skills people use in `extracting' relevant information from text and to assess the extent to which expert systems techniques could be applied to automate the process of abstracting. The approach adopted in this thesis is based on research in cognitive science, information science, psycholinguistics and textlinguistics. The study addressed the significance of domain knowledge and heuristic rules by developing an information extraction system, called INFORMEX. This system, which was implemented partly in SPITBOL, and partly in PROLOG, used a set of heuristic rules to analyse five scientific papers of expository type, to interpret the content in relation to the key abstract elements and to extract a set of sentences recognised as relevant for abstracting purposes. The analysis of these extracts revealed that an adequate abstract could be generated. Furthermore, INFORMEX showed that a rule based system was a suitable computational model to represent experts' knowledge and strategies. This computational technique provided the basis for a new approach to the modelling of cognition. It showed how experts tackle the task of abstracting by integrating formal knowledge as well as experiential learning. This thesis demonstrated that empirical and theoretical knowledge can be effectively combined in expert systems technology to provide a valuable starting approach to automatic abstracting.
Resumo:
This thesis presents a study of the sources of new product ideas and the development of new product proposals in an organisation in the UK Computer Industry. The thesis extends the work of von Hippel by showing how the phenomenon which he describes as "the Customer Active Paradigm for new product idea generation" can be observed to operate in this Industry. Furthermore, this thesis contrasts his Customer Active Paradigm with the more usually encountered Manufacturer Active Paradigm. In a second area, the thesis draws a number of conclusions relating to methods of market research, confirming existing observations and demonstrating the suitability of flexible interview strategies in certain circumstances. The thesis goes on to demonstrate the importance of free information flow within the organisation, making it more likely that sought and unsought opportunities can be exploited. It is shown that formal information flows and documents are a necessary but not sufficient means of influencing the formation of the organisation's dominant ideas on new product areas. The findings also link the work of Tushman and Katz on the role of "Gatekeepers" with the work of von Hippel by showing that the role of gatekeeper is particularly appropriate and useful to an organisation changing from Customer Active to Manufacturer Active methods of idea generation. Finally, the thesis provides conclusions relating to the exploitation of specific new product opportunities facing the sponsoring organisation.
Resumo:
This Thesis addresses the problem of automated false-positive free detection of epileptic events by the fusion of information extracted from simultaneously recorded electro-encephalographic (EEG) and the electrocardiographic (ECG) time-series. The approach relies on a biomedical case for the coupling of the Brain and Heart systems through the central autonomic network during temporal lobe epileptic events: neurovegetative manifestations associated with temporal lobe epileptic events consist of alterations to the cardiac rhythm. From a neurophysiological perspective, epileptic episodes are characterised by a loss of complexity of the state of the brain. The description of arrhythmias, from a probabilistic perspective, observed during temporal lobe epileptic events and the description of the complexity of the state of the brain, from an information theory perspective, are integrated in a fusion-of-information framework towards temporal lobe epileptic seizure detection. The main contributions of the Thesis include the introduction of a biomedical case for the coupling of the Brain and Heart systems during temporal lobe epileptic seizures, partially reported in the clinical literature; the investigation of measures for the characterisation of ictal events from the EEG time series towards their integration in a fusion-of-knowledge framework; the probabilistic description of arrhythmias observed during temporal lobe epileptic events towards their integration in a fusion-of-knowledge framework; and the investigation of the different levels of the fusion-of-information architecture at which to perform the combination of information extracted from the EEG and ECG time-series. The performance of the method designed in the Thesis for the false-positive free automated detection of epileptic events achieved a false-positives rate of zero on the dataset of long-term recordings used in the Thesis.
Resumo:
In this paper, we discuss how discriminative training can be applied to the hidden vector state (HVS) model in different task domains. The HVS model is a discrete hidden Markov model (HMM) in which each HMM state represents the state of a push-down automaton with a finite stack size. In previous applications, maximum-likelihood estimation (MLE) is used to derive the parameters of the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold. Discriminative training, without making such assumptions, can improve the performance of the HVS model by discriminating the correct hypothesis from the competing hypotheses. Experiments have been conducted in two domains: the travel domain for the semantic parsing task using the DARPA Communicator data and the Air Travel Information Services (ATIS) data and the bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a relative error reduction rate of 31 percent in F-measure when compared with MLE on the DARPA Communicator data and 9 percent on the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4 percent in F-measure is achieved on the GENIA corpus.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included.
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence has extends to many areas of these fields and includes contributions to Machine Translation, word sense disambiguation, dialogue modeling and Information Extraction.This book celebrates the work of Yorick Wilks from the perspective of his peers. It consists of original chapters each of which analyses an aspect of his work and links it to current thinking in that area. His work has spanned over four decades but is shown to be pertinent to recent developments in language processing such as the Semantic Web.This volume forms a two-part set together with Words and Intelligence I, Selected Works by Yorick Wilks, by the same editors.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included. © 2010 The authors.
Resumo:
Information extraction or knowledge discovery from large data sets should be linked to data aggregation process. Data aggregation process can result in a new data representation with decreased number of objects of a given set. A deterministic approach to separable data aggregation means a lesser number of objects without mixing of objects from different categories. A statistical approach is less restrictive and allows for almost separable data aggregation with a low level of mixing of objects from different categories. Layers of formal neurons can be designed for the purpose of data aggregation both in the case of deterministic and statistical approach. The proposed designing method is based on minimization of the of the convex and piecewise linear (CPL) criterion functions.
Resumo:
In the paper we consider the technology of new domain's ontologies development. We discuss main principles of ontology development, automatic methods of terms extraction from the domain texts and types of ontology relations.
Resumo:
Floods represent the most devastating natural hazards in the world, affecting more people and causing more property damage than any other natural phenomena. One of the important problems associated with flood monitoring is flood extent extraction from satellite imagery, since it is impractical to acquire the flood area through field observations. This paper presents a method to flood extent extraction from synthetic-aperture radar (SAR) images that is based on intelligent computations. In particular, we apply artificial neural networks, self-organizing Kohonen’s maps (SOMs), for SAR image segmentation and classification. We tested our approach to process data from three different satellite sensors: ERS-2/SAR (during flooding on Tisza river, Ukraine and Hungary, 2001), ENVISAT/ASAR WSM (Wide Swath Mode) and RADARSAT-1 (during flooding on Huaihe river, China, 2007). Obtained results showed the efficiency of our approach.
Resumo:
Many organic compounds cause an irreversible damage to human health and the ecosystem and are present in water resources. Among these hazard substances, phenolic compounds play an important role on the actual contamination. Utilization of membrane technology is increasing exponentially in drinking water production and waste water treatment. The removal of organic compounds by nanofiltration membranes is characterized not only by molecular sieving effects but also by membrane-solute interactions. Influence of the sieving parameters (molecular weight and molecular diameter) and the physicochemical interactions (dissociation constant and molecular hydrophobicity) on the membrane rejection of the organic solutes were studied. The molecular hydrophobicity is expressed as logarithm of octanol-water partition coefficient. This paper proposes a method used that can be used for symbolic knowledge extraction from a trained neural network, once they have been trained with the desired performance and is based on detect the more important variables in problems where exist multicolineality among the input variables.
Resumo:
In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal di®ers from previous research as it tries to take the best of two different methodologies i.e. semantic space models and information extraction models. It can be applied to extract close semantic relations, it limits the search space and it is unsupervised.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2016