906 resultados para Text feature extraction


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents an efficient Online Handwritten character Recognition System for Malayalam Characters (OHR-M) using Kohonen network. It would help in recognizing Malayalam text entered using pen-like devices. It will be more natural and efficient way for users to enter text using a pen than keyboard and mouse. To identify the difference between similar characters in Malayalam a novel feature extraction method has been adopted-a combination of context bitmap and normalized (x, y) coordinates. The system reported an accuracy of 88.75% which is writer independent with a recognition time of 15-32 milliseconds

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Simultaneous Localization and Mapping (SLAM) do not result in consistent maps of large areas because of gradual increase of the uncertainty for long term missions. In addition, as the size of the map grows the computational cost increases, making SLAM solutions unsuitable for on-line applications. This thesis surveys SLAM approaches paying special attention to those approaches aimed to work on large scenarios. Special focus is given to existing underwater SLAM applications. A technique based on using independent local maps together with a global stochastic map is presented. This technique is called Selective Submap Joining SLAM (SSJS). A global map contains relative transformations between local maps, which are updated once a new loop is detected. Maps sharing several features are fused, maintaining the correlation between landmarks and vehicle's pose. The use of local maps reduces computational costs and improves map consistency as compared to state of the art techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we introduce a novel high-level visual content descriptor which is devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt to bridge the so called “semantic gap”. The proposed image feature vector model is fundamentally underpinned by the image labelling framework, called Collaterally Confirmed Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts of the images with the state-of-the-art low-level image processing and visual feature extraction techniques for automatically assigning linguistic keywords to image regions. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicates that our proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n = 30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multispectral iris recognition uses information from multiple bands of the electromagnetic spectrum to better represent certain physiological characteristics of the iris texture and enhance obtained recognition accuracy. This paper addresses the questions of single versus cross spectral performance and compares score-level fusion accuracy for different feature types, combining different wavelengths to overcome limitations in less constrained recording environments. Further it is investigated whether Doddington's “goats” (users who are particularly difficult to recognize) in one spectrum also extend to other spectra. Focusing on the question of feature stability at different wavelengths, this work uses manual ground truth segmentation, avoiding bias by segmentation impact. Experiments on the public UTIRIS multispectral iris dataset using 4 feature extraction techniques reveal a significant enhancement when combining NIR + Red for 2-channel and NIR + Red + Blue for 3-channel fusion, across different feature types. Selective feature-level fusion is investigated and shown to improve overall and especially cross-spectral performance without increasing the overall length of the iris code.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper investigates the potential of fusion at normalisation/segmentation level prior to feature extraction. While there are several biometric fusion methods at data/feature level, score level and rank/decision level combining raw biometric signals, scores, or ranks/decisions, this type of fusion is still in its infancy. However, the increasing demand to allow for more relaxed and less invasive recording conditions, especially for on-the-move iris recognition, suggests to further investigate fusion at this very low level. This paper focuses on the approach of multi-segmentation fusion for iris biometric systems investigating the benefit of combining the segmentation result of multiple normalisation algorithms, using four methods from two different public iris toolkits (USIT, OSIRIS) on the public CASIA and IITD iris datasets. Evaluations based on recognition accuracy and ground truth segmentation data indicate high sensitivity with regards to the type of errors made by segmentation algorithms.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ever increasing spurt in digital crimes such as image manipulation, image tampering, signature forgery, image forgery, illegal transaction, etc. have hard pressed the demand to combat these forms of criminal activities. In this direction, biometrics - the computer-based validation of a persons' identity is becoming more and more essential particularly for high security systems. The essence of biometrics is the measurement of person’s physiological or behavioral characteristics, it enables authentication of a person’s identity. Biometric-based authentication is also becoming increasingly important in computer-based applications because the amount of sensitive data stored in such systems is growing. The new demands of biometric systems are robustness, high recognition rates, capability to handle imprecision, uncertainties of non-statistical kind and magnanimous flexibility. It is exactly here that, the role of soft computing techniques comes to play. The main aim of this write-up is to present a pragmatic view on applications of soft computing techniques in biometrics and to analyze its impact. It is found that soft computing has already made inroads in terms of individual methods or in combination. Applications of varieties of neural networks top the list followed by fuzzy logic and evolutionary algorithms. In a nutshell, the soft computing paradigms are used for biometric tasks such as feature extraction, dimensionality reduction, pattern identification, pattern mapping and the like.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The acquisition and update of Geographic Information System (GIS) data are typically carried out using aerial or satellite imagery. Since new roads are usually linked to georeferenced pre-existing road network, the extraction of pre-existing road segments may provide good hypotheses for the updating process. This paper addresses the problem of extracting georeferenced roads from images and formulating hypotheses for the presence of new road segments. Our approach proceeds in three steps. First, salient points are identified and measured along roads from a map or GIS database by an operator or an automatic tool. These salient points are then projected onto the image-space and errors inherent in this process are calculated. In the second step, the georeferenced roads are extracted from the image using a dynamic programming (DP) algorithm. The projected salient points and corresponding error estimates are used as input for this extraction process. Finally, the road center axes extracted in the previous step are analyzed to identify potential new segments attached to the extracted, pre-existing one. This analysis is performed using a combination of edge-based and correlation-based algorithms. In this paper we present our approach and early implementation results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Parkinson's disease (PD) automatic identification has been actively pursued over several works in the literature. In this paper, we deal with this problem by applying evolutionary-based techniques in order to find the subset of features that maximize the accuracy of the Optimum-Path Forest (OPF) classifier. The reason for the choice of this classifier relies on its fast training phase, given that each possible solution to be optimized is guided by the OPF accuracy. We also show results that improved other ones recently obtained in the context of PD automatic identification. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be in-viable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the bats behaviour, which has never been applied to this context so far. The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in five public datasets have demonstrated that the proposed approach can outperform some well-known swarm-based techniques. © 2012 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Feature selection has been actively pursued in the last years, since to find the most discriminative set of features can enhance the recognition rates and also to make feature extraction faster. In this paper, the propose a new feature selection called Binary Cuckoo Search, which is based on the behavior of cuckoo birds. The experiments were carried out in the context of theft detection in power distribution systems in two datasets obtained from a Brazilian electrical power company, and have demonstrated the robustness of the proposed technique against with several others nature-inspired optimization techniques. © 2013 IEEE.