20 resultados para Text feature extraction

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rotation invariance is important for an iris recognition system since changes of head orientation and binocular vergence may cause eye rotation. The conventional methods of iris recognition cannot achieve true rotation invariance. They only achieve approximate rotation invariance by rotating the feature vector before matching or unwrapping the iris ring at different initial angles. In these methods, the complexity of the method is increased, and when the rotation scale is beyond the certain scope, the error rates of these methods may substantially increase. In order to solve this problem, a new rotation invariant approach for iris feature extraction based on the non-separable wavelet is proposed in this paper. Firstly, a bank of non-separable orthogonal wavelet filters is used to capture characteristics of the iris. Secondly, a method of Markov random fields is used to capture rotation invariant iris feature. Finally, two-class kernel Fisher classifiers are adopted for classification. Experimental results on public iris databases show that the proposed approach has a low error rate and achieves true rotation invariance. © 2010.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis is a study of the generation of topographic mappings - dimension reducing transformations of data that preserve some element of geometric structure - with feed-forward neural networks. As an alternative to established methods, a transformational variant of Sammon's method is proposed, where the projection is effected by a radial basis function neural network. This approach is related to the statistical field of multidimensional scaling, and from that the concept of a 'subjective metric' is defined, which permits the exploitation of additional prior knowledge concerning the data in the mapping process. This then enables the generation of more appropriate feature spaces for the purposes of enhanced visualisation or subsequent classification. A comparison with established methods for feature extraction is given for data taken from the 1992 Research Assessment Exercise for higher educational institutions in the United Kingdom. This is a difficult high-dimensional dataset, and illustrates well the benefit of the new topographic technique. A generalisation of the proposed model is considered for implementation of the classical multidimensional scaling (¸mds}) routine. This is related to Oja's principal subspace neural network, whose learning rule is shown to descend the error surface of the proposed ¸mds model. Some of the technical issues concerning the design and training of topographic neural networks are investigated. It is shown that neural network models can be less sensitive to entrapment in the sub-optimal global minima that badly affect the standard Sammon algorithm, and tend to exhibit good generalisation as a result of implicit weight decay in the training process. It is further argued that for ideal structure retention, the network transformation should be perfectly smooth for all inter-data directions in input space. Finally, there is a critique of optimisation techniques for topographic mappings, and a new training algorithm is proposed. A convergence proof is given, and the method is shown to produce lower-error mappings more rapidly than previous algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel approach to water pollution detection from remotely sensed low-platform mounted visible band camera images. We examine the feasibility of unsupervised segmentation for slick (oily spills on the water surface) region labelling. Adaptive and non adaptive filtering is combined with density modeling of the obtained textural features. A particular effort is concentrated on the textural feature extraction from raw intensity images using filter banks and adaptive feature extraction from the obtained output coefficients. Segmentation in the extracted feature space is achieved using Gaussian mixture models (GMM).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study considers the application of image analysis in petrography and investigates the possibilities for advancing existing techniques by introducing feature extraction and analysis capabilities of a higher level than those currently employed. The aim is to construct relevant, useful descriptions of crystal form and inter-crystal relations in polycrystalline igneous rock sections. Such descriptions cannot be derived until the `ownership' of boundaries between adjacent crystals has been established: this is the fundamental problem of crystal boundary assignment. An analysis of this problem establishes key image features which reveal boundary ownership; a set of explicit analysis rules is presented. A petrographic image analysis scheme based on these principles is outlined and the implementation of key components of the scheme considered. An algorithm for the extraction and symbolic representation of image structural information is developed. A new multiscale analysis algorithm which produces a hierarchical description of the linear and near-linear structure on a contour is presented in detail. Novel techniques for symmetry analysis are developed. The analyses considered contribute both to the solution of the boundary assignment problem and to the construction of geologically useful descriptions of crystal form. The analysis scheme which is developed employs grouping principles such as collinearity, parallelism, symmetry and continuity, so providing a link between this study and more general work in perceptual grouping and intermediate level computer vision. Consequently, the techniques developed in this study may be expected to find wider application beyond the petrographic domain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feed-forward neural network is utilised to effect a topographic, structure-preserving, dimension-reducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Good estimates of ecosystem complexity are essential for a number of ecological tasks: from biodiversity estimation, to forest structure variable retrieval, to feature extraction by edge detection and generation of multifractal surface as neutral models for e.g. feature change assessment. Hence, measuring ecological complexity over space becomes crucial in macroecology and geography. Many geospatial tools have been advocated in spatial ecology to estimate ecosystem complexity and its changes over space and time. Among these tools, free and open source options especially offer opportunities to guarantee the robustness of algorithms and reproducibility. In this paper we will summarize the most straightforward measures of spatial complexity available in the Free and Open Source Software GRASS GIS, relating them to key ecological patterns and processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: In molecular biology, molecular events describe observable alterations of biomolecules, such as binding of proteins or RNA production. These events might be responsible for drug reactions or development of certain diseases. As such, biomedical event extraction, the process of automatically detecting description of molecular interactions in research articles, attracted substantial research interest recently. Event trigger identification, detecting the words describing the event types, is a crucial and prerequisite step in the pipeline process of biomedical event extraction. Taking the event types as classes, event trigger identification can be viewed as a classification task. For each word in a sentence, a trained classifier predicts whether the word corresponds to an event type and which event type based on the context features. Therefore, a well-designed feature set with a good level of discrimination and generalization is crucial for the performance of event trigger identification. Results: In this article, we propose a novel framework for event trigger identification. In particular, we learn biomedical domain knowledge from a large text corpus built from Medline and embed it into word features using neural language modeling. The embedded features are then combined with the syntactic and semantic context features using the multiple kernel learning method. The combined feature set is used for training the event trigger classifier. Experimental results on the golden standard corpus show that >2.5% improvement on F-score is achieved by the proposed framework when compared with the state-of-the-art approach, demonstrating the effectiveness of the proposed framework. © 2014 The Author 2014. The source code for the proposed framework is freely available and can be downloaded at http://cse.seu.edu.cn/people/zhoudeyu/ETI_Sourcecode.zip.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present thesis investigates mode related aspects in biology lecture discourse and attempts to identify the position of this variety along the spontaneous spoken versus planned written language continuum. Nine lectures (of 43,000 words) consisting of three sets of three lectures each, given by the three lecturers at Aston University, make up the corpus. The indeterminacy of the results obtained from the investigation of grammatical complexity as measured in subordination motivates the need to take the analysis beyond sentence level to the study of mode related aspects in the use of sentence-initial connectives, sub-topic shifting and paraphrase. It is found that biology lecture discourse combines features typical of speech and writing at sentence as well as discourse level: thus, subordination is more used than co-ordination, but one degree complexity sentence is favoured; some sentence initial connectives are only found in uses typical of spoken language but sub-topic shift signalling (generally introduced by a connective) typical of planned written language is a major feature of the lectures; syntactic and lexical revision and repetition, interrupted structures are found in the sub-topic shift signalling utterance and paraphrase, but the text is also amenable to analysis into sentence like units. On the other hand, it is also found that: (1) while there are some differences in the use of a given feature, inter-speaker variation is on the whole not significant; (2) mode related aspects are often motivated by the didactic function of the variety; and (3) the structuring of the text follows a sequencing whose boundaries are marked by sub-topic shifting and the summary paraphrase. This study enables us to draw four theoretical conclusions: (1) mode related aspects cannot be approached as a simple dichotomy since a combination of aspects of both speech and writing are found in a given feature. It is necessary to go to the level of textual features to identify mode related aspects; (2) homogeneity is dominant in this sample of lectures which suggests that there is a high level of standardization in this variety; (3) the didactic function of the variety is manifested in some mode related aspects; (4) the features studied play a role in the structuring of the text.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Working within the framework of the branch of Linguistics known as discourse analysis, and more specifically within the current approach of genre analysis, this thesis presents an analysis of the English of economic forecasting. The language of economic forecasting is highly specialised and follows certain conventions of structure and style. This research project identifies these characteristics and explains them in terms of their communicative function. The work is based on a corpus of texts published in economic reports and surveys by major corporate bodies. These documents are targeted at an international expert readership familiar with this genre. The data is analysed at two broad levels: firstly, the macro-level of text structure which is described in terms of schema-theory, a currently influential model of analysis, and, secondly, the micro-level of authors' strategies for modulating the predictions which form the key move in the forecasting schema. The thesis aims to contribute to the newly developing field of genre analysis in a number of ways: firstly, by a coverage of a hitherto neglected but intrinsically interesting and important genre (Economic Forecasting); secondly, by testing the applicability of existing models of analysis at the level of schematic structure and proposing a genre-specific model; thirdly by offering insights into the nature of modulation of propositions which is often broadly classified as `hedging' or `modality', and which has been recently described as lq`an area for prolonged fieldwork'. This phenomenon is shown to be a key feature of this particular genre. It is suggested that this thesis, in addition to its contribution to the theory of genre analysis, provides a useful basis for work by teachers of English for Economics, an important area of English for Specific Purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.