845 resultados para Arabic word segmentation


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the efforts at MILE lab, IISc, to create a 100,000-word database each in Kannada and Tamil for the design and development of Online Handwritten Recognition. It has been collected from over 600 users in order to capture the variations in writing style. We describe features of the scripts and how the number of symbols were reduced to be able to effectively train the data for recognition. The list of words include all the characters, Kannada and Indo-Arabic numerals, punctuations and other symbols. A semi-automated tool for the annotation of data from stroke to word level is used. It segments each word into stroke groups and also acts as a validation mechanism for segmentation. The tool displays the stroke, stroke groups and aksharas of a word and hence can be used to study the various styles of writing, delayed strokes and for assigning quality tags to the words. The tool is currently being used for annotating Tamil and Kannada data. The output is stored in a standard XML format.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a sub-image and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation. This midline based approach limits the impact of degradations that happens to the image. The segmented text image is recognized using the trial version of Omnipage OCR. We have tested our method on ICDAR 2003 and ICDAR 2011 datasets. Our word recognition results of 64.5% and 71.6% are better than those of methods in the literature and also methods that competed in the Robust reading competition. Our method makes an implicit assumption that degradation is not present in the middle row.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In optical character recognition of very old books, the recognition accuracy drops mainly due to the merging or breaking of characters. In this paper, we propose the first algorithm to segment merged Kannada characters by using a hypothesis to select the positions to be cut. This method searches for the best possible positions to segment, by taking into account the support vector machine classifier's recognition score and the validity of the aspect ratio (width to height ratio) of the segments between every pair of cut positions. The hypothesis to select the cut position is based on the fact that a concave surface exists above and below the touching portion. These concave surfaces are noted down by tracing the valleys in the top contour of the image and similarly doing it for the image rotated upside-down. The cut positions are then derived as closely matching valleys of the original and the rotated images. Our proposed segmentation algorithm works well for different font styles, shapes and sizes better than the existing vertical projection profile based segmentation. The proposed algorithm has been tested on 1125 different word images, each containing multiple merged characters, from an old Kannada book and 89.6% correct segmentation is achieved and the character recognition accuracy of merged words is 91.2%. A few points of merge are still missed due to the absence of a matched valley due to the specific shapes of the particular characters meeting at the merges.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the 'Morphological Analysis and Disambiguation for Arabic' (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1 relative. © 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical learning can be used to extract the words from continuous speech. Gómez, Bion, and Mehler (Language and Cognitive Processes, 26, 212–223, 2011) proposed an online measure of statistical learning: They superimposed auditory clicks on a continuous artificial speech stream made up of a random succession of trisyllabic nonwords. Participants were instructed to detect these clicks, which could be located either within or between words. The results showed that, over the length of exposure, reaction times (RTs) increased more for within-word than for between-word clicks. This result has been accounted for by means of statistical learning of the between-word boundaries. However, even though statistical learning occurs without an intention to learn, it nevertheless requires attentional resources. Therefore, this process could be affected by a concurrent task such as click detection. In the present study, we evaluated the extent to which the click detection task indeed reflects successful statistical learning. Our results suggest that the emergence of RT differences between within- and between-word click detection is neither systematic nor related to the successful segmentation of the artificial language. Therefore, instead of being an online measure of learning, the click detection task seems to interfere with the extraction of statistical regularities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ce mémoire présente une étude de la morphologie de ce qui est généralement appelé le pluriel nominal du persan (parler de Téhéran) dans le cadre d’une théorie de la morphologie basée sur le mot : Whole Word Morphology, développée par Ford et Singh (1991). Ce modèle lexicaliste adopte une position plus forte que les modèles proposés par Aronoff (1976) et Anderson (1992) en n’admettant aucune opération morphologique sur des unités plus petites que le mot. Selon cette théorie, une description morphologique consiste en l’énumération des Stratégies de Formation de Mots (SFM), licencées chacunes par au moins deux paires de mots ayant la même covariation formelle et sémantique. Tous les SFM suit le même schéma. Nous avons répertorié 49 SFM regroupant les pluriels et les collectifs. Nous constatons qu’il est difficile de saisir le pluriel nominal du persan en tant que catégorie syntaxique et que les différentes « marques du pluriel » présentées dans la littérature ne constituent pas un ensemble homogène : elles partagent toutes un sens de pluralité qui cependant varie d’une interprétation référentielle à une interprétation collective non-référentielle. Cette étude vise la déscription de la compétence morphologique, ce qui ne dépend d’aucune considération extralinguistique. Nous argumentons notamment contre la dichotomie arabe/persan généralement admise dans la littérature. Nous avons également fourni des explications quant à la production des pluriels doubles et avons discuté de la variation supposée du fait d’un choix multiple de « marques du pluriel ».

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech is often a multimodal process, presented audiovisually through a talking face. One area of speech perception influenced by visual speech is speech segmentation, or the process of breaking a stream of speech into individual words. Mitchel and Weiss (2013) demonstrated that a talking face contains specific cues to word boundaries and that subjects can correctly segment a speech stream when given a silent video of a speaker. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2013). In Experiment 1, subjects were found to spend the most time watching the eyes and mouth, with a trend suggesting that the mouth was viewed more than the eyes. Although subjects displayed significant learning of word boundaries, performance was not correlated with gaze duration on any individual feature, nor was performance correlated with a behavioral measure of autistic-like traits. However, trends suggested that as autistic-like traits increased, gaze duration of the mouth increased and gaze duration of the eyes decreased, similar to significant trends seen in autistic populations (Boratston & Blakemore, 2007). In Experiment 2, the same video was modified so that a black bar covered the eyes or mouth. Both videos elicited learning of word boundaries that was equivalent to that seen in the first experiment. Again, no correlations were found between segmentation performance and SRS scores in either condition. These results, taken with those in Experiment, suggest that neither the eyes nor mouth are critical to speech segmentation and that perhaps more global head movements indicate word boundaries (see Graf, Cosatto, Strom, & Huang, 2002). Future work will elucidate the contribution of individual features relative to global head movements, as well as extend these results to additional types of speech tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

cum triplici versione Latina, & scholijs Thomae Erpenii, cujus & alphabetum Arabicum praemittitur.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study presents a detailed contrastive description of the textual functioning of connectives in English and Arabic. Particular emphasis is placed on the organisational force of connectives and their role in sustaining cohesion. The description is intended as a contribution for a better understanding of the variations in the dominant tendencies for text organisation in each language. The findings are expected to be utilised for pedagogical purposes, particularly in improving EFL teaching of writing at the undergraduate level. The study is based on an empirical investigation of the phenomenon of connectivity and, for optimal efficiency, employs computer-aided procedures, particularly those adopted in corpus linguistics, for investigatory purposes. One important methodological requirement is the establishment of two comparable and statistically adequate corpora, also the design of software and the use of existing packages and to achieve the basic analysis. Each corpus comprises ca 250,000 words of newspaper material sampled in accordance to a specific set of criteria and assembled in machine readable form prior to the computer-assisted analysis. A suite of programmes have been written in SPITBOL to accomplish a variety of analytical tasks, and in particular to perform a battery of measurements intended to quantify the textual functioning of connectives in each corpus. Concordances and some word lists are produced by using OCP. Results of these researches confirm the existence of fundamental differences in text organisation in Arabic in comparison to English. This manifests itself in the way textual operations of grouping and sequencing are performed and in the intensity of the textual role of connectives in imposing linearity and continuity and in maintaining overall stability. Furthermore, computation of connective functionality and range of operationality has identified fundamental differences in the way favourable choices for text organisation are made and implemented.