7 resultados para automatic content extraction

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The primary objective of this research was to understand what kinds of knowledge and skills people use in `extracting' relevant information from text and to assess the extent to which expert systems techniques could be applied to automate the process of abstracting. The approach adopted in this thesis is based on research in cognitive science, information science, psycholinguistics and textlinguistics. The study addressed the significance of domain knowledge and heuristic rules by developing an information extraction system, called INFORMEX. This system, which was implemented partly in SPITBOL, and partly in PROLOG, used a set of heuristic rules to analyse five scientific papers of expository type, to interpret the content in relation to the key abstract elements and to extract a set of sentences recognised as relevant for abstracting purposes. The analysis of these extracts revealed that an adequate abstract could be generated. Furthermore, INFORMEX showed that a rule based system was a suitable computational model to represent experts' knowledge and strategies. This computational technique provided the basis for a new approach to the modelling of cognition. It showed how experts tackle the task of abstracting by integrating formal knowledge as well as experiential learning. This thesis demonstrated that empirical and theoretical knowledge can be effectively combined in expert systems technology to provide a valuable starting approach to automatic abstracting.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research was undertaken to: develop a process for the direct solvent extraction of castor oil seeds. A literature survey confirmed the desirability of establishing such a process with emphasis on the decortication, size, reduction, detoxification-deallergenization, and solvent·extraction operations. A novel process was developed for the dehulling of castor seeds which consists of pressurizing the beans and then suddenly releasing the pressure to vaccum. The degree of dehulling varied according to the pressure applied and the size of the beans. Some of the batches were difficult-to-hull, and this phenomenon was investigated using the scanning electron microscope and by thickness and compressive strength measurements. The other variables studied to lesser degrees included residence time, moisture, content, and temperature.The method was successfully extended to cocoa beans, and (with modifications) to peanuts. The possibility of continuous operation was looked into, and a mechanism was suggested to explain the method works. The work on toxins and allergens included an extensive literature survey on the properties of these substances and the methods developed for their deactivation Part of the work involved setting up an assay method for measuring their concentration in the beans and cake, but technical difficulties prevented the completion of this aspect of the project. An appraisal of the existing deactivation methods was made in the course of searching for new ones. A new method of reducing the size of oilseeds was introduced in this research; it involved freezing the beans in cardice and milling them in a coffee grinder, the method was found to be a quick, efficient, and reliable. An application of the freezing technique was successful in dehulling soybeans and de-skinning peanut kernels. The literature on the solvent extraction, of oilseeds, especially castor, was reviewed: The survey covered processes, equipment, solvents, and mechanism of leaching. three solvents were experimentally investigated: cyclohexane, ethanol, and acetone. Extraction with liquid ammonia and liquid butane was not effective under the conditions studied. Based on the results of the research a process has been suggested for the direct solvent extraction of castor seeds, the various sections of the process have analysed, and the factors affecting the economics of the process were discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geometric information relating to most engineering products is available in the form of orthographic drawings or 2D data files. For many recent computer based applications, such as Computer Integrated Manufacturing (CIM), these data are required in the form of a sophisticated model based on Constructive Solid Geometry (CSG) concepts. A recent novel technique in this area transfers 2D engineering drawings directly into a 3D solid model called `the first approximation'. In many cases, however, this does not represent the real object. In this thesis, a new method is proposed and developed to enhance this model. This method uses the notion of expanding an object in terms of other solid objects, which are either primitive or first approximation models. To achieve this goal, in addition to the prepared subroutine to calculate the first approximation model of input data, two other wireframe models are found for extraction of sub-objects. One is the wireframe representation on input, and the other is the wireframe of the first approximation model. A new fast method is developed for the latter special case wireframe, which is named the `first approximation wireframe model'. This method avoids the use of a solid modeller. Detailed descriptions of algorithms and implementation procedures are given. In these techniques utilisation of dashed line information is also considered in improving the model. Different practical examples are given to illustrate the functioning of the program. Finally, a recursive method is employed to automatically modify the output model towards the real object. Some suggestions for further work are made to increase the domain of objects covered, and provide a commercially usable package. It is concluded that the current method promises the production of accurate models for a large class of objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Online communities are prime sources of information. The Web is rich with forums and Question Answering (Q&A) communities where people go to seek answers to all kinds of questions. Most systems employ manual answer-rating procedures to encourage people to provide quality answers and to help users locate the best answers in a given thread. However, in the datasets we collected from three online communities, we found that half their threads lacked best answer markings. This stresses the need for methods to assess the quality of available answers to: 1) provide automated ratings to fill in for, or support, manually assigned ones, and; 2) to assist users when browsing such answers by filtering in potential best answers. In this paper, we collected data from three online communities and converted it to RDF based on the SIOC ontology. We then explored an approach for predicting best answers using a combination of content, user, and thread features. We show how the influence of such features on predicting best answers differs across communities. Further we demonstrate how certain features unique to some of our community systems can boost predictability of best answers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microposts are small fragments of social media content that have been published using a lightweight paradigm (e.g. Tweets, Facebook likes, foursquare check-ins). Microposts have been used for a variety of applications (e.g., sentiment analysis, opinion mining, trend analysis), by gleaning useful information, often using third-party concept extraction tools. There has been very large uptake of such tools in the last few years, along with the creation and adoption of new methods for concept extraction. However, the evaluation of such efforts has been largely consigned to document corpora (e.g. news articles), questioning the suitability of concept extraction tools and methods for Micropost data. This report describes the Making Sense of Microposts Workshop (#MSM2013) Concept Extraction Challenge, hosted in conjunction with the 2013 World Wide Web conference (WWW'13). The Challenge dataset comprised a manually annotated training corpus of Microposts and an unlabelled test corpus. Participants were set the task of engineering a concept extraction system for a defined set of concepts. Out of a total of 22 complete submissions 13 were accepted for presentation at the workshop; the submissions covered methods ranging from sequence mining algorithms for attribute extraction to part-of-speech tagging for Micropost cleaning and rule-based and discriminative models for token classification. In this report we describe the evaluation process and explain the performance of different approaches in different contexts.