5 resultados para Automatic term extraction

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geometric information relating to most engineering products is available in the form of orthographic drawings or 2D data files. For many recent computer based applications, such as Computer Integrated Manufacturing (CIM), these data are required in the form of a sophisticated model based on Constructive Solid Geometry (CSG) concepts. A recent novel technique in this area transfers 2D engineering drawings directly into a 3D solid model called `the first approximation'. In many cases, however, this does not represent the real object. In this thesis, a new method is proposed and developed to enhance this model. This method uses the notion of expanding an object in terms of other solid objects, which are either primitive or first approximation models. To achieve this goal, in addition to the prepared subroutine to calculate the first approximation model of input data, two other wireframe models are found for extraction of sub-objects. One is the wireframe representation on input, and the other is the wireframe of the first approximation model. A new fast method is developed for the latter special case wireframe, which is named the `first approximation wireframe model'. This method avoids the use of a solid modeller. Detailed descriptions of algorithms and implementation procedures are given. In these techniques utilisation of dashed line information is also considered in improving the model. Different practical examples are given to illustrate the functioning of the program. Finally, a recursive method is employed to automatically modify the output model towards the real object. Some suggestions for further work are made to increase the domain of objects covered, and provide a commercially usable package. It is concluded that the current method promises the production of accurate models for a large class of objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary objective of this research was to understand what kinds of knowledge and skills people use in `extracting' relevant information from text and to assess the extent to which expert systems techniques could be applied to automate the process of abstracting. The approach adopted in this thesis is based on research in cognitive science, information science, psycholinguistics and textlinguistics. The study addressed the significance of domain knowledge and heuristic rules by developing an information extraction system, called INFORMEX. This system, which was implemented partly in SPITBOL, and partly in PROLOG, used a set of heuristic rules to analyse five scientific papers of expository type, to interpret the content in relation to the key abstract elements and to extract a set of sentences recognised as relevant for abstracting purposes. The analysis of these extracts revealed that an adequate abstract could be generated. Furthermore, INFORMEX showed that a rule based system was a suitable computational model to represent experts' knowledge and strategies. This computational technique provided the basis for a new approach to the modelling of cognition. It showed how experts tackle the task of abstracting by integrating formal knowledge as well as experiential learning. This thesis demonstrated that empirical and theoretical knowledge can be effectively combined in expert systems technology to provide a valuable starting approach to automatic abstracting.