968 resultados para word recognition
Resumo:
Joint decoding of multiple speech patterns so as to improve speech recognition performance is important, especially in the presence of noise. In this paper, we propose a Multi-Pattern Viterbi algorithm (MPVA) to jointly decode and recognize multiple speech patterns for automatic speech recognition (ASR). The MPVA is a generalization of the Viterbi Algorithm to jointly decode multiple patterns given a Hidden Markov Model (HMM). Unlike the previously proposed two stage Constrained Multi-Pattern Viterbi Algorithm (CMPVA),the MPVA is a single stage algorithm. MPVA has the advantage that it cart be extended to connected word recognition (CWR) and continuous speech recognition (CSR) problems. MPVA is shown to provide better speech recognition performance than the earlier techniques: using only two repetitions of noisy speech patterns (-5 dB SNR, 10% burst noise), the word error rate using MPVA decreased by 28.5%, when compared to using individual decoding. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The following topics were dealt with: document analysis and recognition; multimedia document processing; character recognition; document image processing; cheque processing; form processing; music processing; document segmentation; electronic documents; character classification; handwritten character recognition; information retrieval; postal automation; font recognition; Indian language OCR; handwriting recognition; performance evaluation; graphics recognition; oriental character recognition; and word recognition
Resumo:
Scenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a sub-image and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation. This midline based approach limits the impact of degradations that happens to the image. The segmented text image is recognized using the trial version of Omnipage OCR. We have tested our method on ICDAR 2003 and ICDAR 2011 datasets. Our word recognition results of 64.5% and 71.6% are better than those of methods in the literature and also methods that competed in the Robust reading competition. Our method makes an implicit assumption that degradation is not present in the middle row.
Resumo:
Does language-specific orthography help language detection and lexical access in naturalistic bilingual contexts? This study investigates how L2 orthotactic properties influence bilingual language detection in bilingual societies and the extent to which it modulates lexical access and single word processing. Language specificity of naturalistically learnt L2 words was manipulated by including bigram combinations that could be either L2 language-specific or common in the two languages known by bilinguals. A group of balanced bilinguals and a group of highly proficient but unbalanced bilinguals who grew up in a bilingual society were tested, together with a group of monolinguals (for control purposes). All the participants completed a speeded language detection task and a progressive demasking task. Results showed that the use of the information of orthotactic rules across languages depends on the task demands at hand, and on participants' proficiency in the second language. The influence of language orthotactic rules during language detection, lexical access and word identification are discussed according to the most prominent models of bilingual word recognition.
Resumo:
The paper describes the architecture of VODIS, a voice operated database inquiry system, and presents some experiments which investigate the effects on performance of varying the level of a priori syntactic constraints. The VODIS system includes a novel mechanism for incorporating context-free grammatical constraints directly into the word recognition algorithm. This allows the degree of a priori constraint to be smoothly varied and provides for the controlled generation of multiple alternatives. The results show that when the spoken input deviates from the predefined task grammar, a combination of weak a priori syntax rules in conjunction with full a posteriori parsing on a lattice of alternative word matches provides the most robust recognition performance. © 1991.
Resumo:
This paper describes two applications in speech recognition of the use of stochastic context-free grammars (SCFGs) trained automatically via the Inside-Outside Algorithm. First, SCFGs are used to model VQ encoded speech for isolated word recognition and are compared directly to HMMs used for the same task. It is shown that SCFGs can model this low-level VQ data accurately and that a regular grammar based pre-training algorithm is effective both for reducing training time and obtaining robust solutions. Second, an SCFG is inferred from a transcription of the speech used to train a phoneme-based recognizer in an attempt to model phonotactic constraints. When used as a language model, this SCFG gives improved performance over a comparable regular grammar or bigram. © 1991.
Resumo:
Reading is an important human-specific skill obtained through extensive learning experience and is reliance on the ability to rapidly recognize single words. According to the behavioral studies, the most important stage of reading is the representation of “visual word form”, which is independent on surface visual features of the reading materials. The prelexical visual word form representation is characterized by the abstractive and highly effective and precise processing. Neuroimaging and neuropsychological studies have investigated the neural basis underlying the visual word form processing. On the basis of summary of the existing literature, the current thesis aimed to address three fundamental questions involving neural basis of word recognition. First, is there a dedicated neural network that is specialized for word recognition? Second, is the orthographic information represented in the putative word/character selective region (VWFA)? Third, what is the role of reading experience in the genesis of the VWFA, is experience a main driver to shape VWFA instead of evolutionary selectivity? Nineteen Chinese literate volunteers, 5 Chinese illiterates and 4 native English speakers participated in this study, and performed perceptual tasks during fMRI scanning. To address the first question, we compared the differential responses to three categories of visual objects, i.e., faces, line drawings of objects and Chinese characters, and defined the region of interesting (ROI) for the next experiment. To address the second question, Chinese character orthography was manipulated to reveal possible differential responses to real characters, false characters, radical combinations, and stroke combinations in the regions defined by the first experiment. To examine the role of reading experience in genesis of specialization for character, the responses for unfamiliar Chinese characters in Chinese illiterates and native English speakers were compared with that in the Chinese literates, and tracked the change in cortical activation after a short-term reading training in the illiterates. Data were analyzed in two dimensions. Both BOLD signal amplitude and spatial distribution pattern among multi-voxels were used to systematically investigate the responsiveness of the left fusiform gyrus to Chinese characters. Our results provide strong and clear evidence for the existence of functionally specialized regions in the human ventral occipital-temporal cortex. In the skilled readers a region specialized for written words could be consistently found in the lateral part of the left fusiform gyrus, line drawings in the median part and faces in the middle. Our results further show that spatial distribution analysis, a method that was not commonly used in neuroimaging of reading, appears to be a more effective measurement for category specialization for visual objects processing. Although we failed to provide evidence that VWFA processes orthographic information in terms of signal intensitiy, we do show that response pattern of real characters and radical collections in this area is different from that of false characters and random stroke combinations. Our last set of experiments suggests that the selective bias to reading material is clearly experience dependent. The response to unknown characters in both English speakers/readers and Chinese illiterates is fundamentally different from that of the skilled Chinese readers. The response pattern for unknown characters is more similar to that for line drawings rather as a weak version of character in skilled Chinese readers. Short-term training is not sufficient to produce VWFA bias even when tested with learned characters, rather the learned characters generated a overall upward shift of the activation of the left fusiform region. Formation of a dedicated region specialized for visual word/character might depend on long-term extensive reading experience, or there might be a critical period for reading acquisition.
Resumo:
The present study investigated the effects of using an assistive software homophone tool on the assisted proofreading performance and unassisted basic skills of secondary-level students with reading difficulties. Students aged 13 to 15 years proofread passages for homophonic errors under three conditions: with the homophone tool, with homophones highlighted only, or with no help. The group using the homophone tool significantly outperformed the other two groups on assisted proofreading and outperformed the others on unassisted spelling, although not significantly. Remedial (unassisted) improvements in automaticity of word recognition, homophone proofreading, and basic reading were found over all groups. Results elucidate the differential contributions of each function of the homophone tool and suggest that with the proper training, assistive software can help not only students with diagnosed disabilities but also those with generally weak reading skills.
Resumo:
The ability to learn new reading vocabulary was assessed in 30 grade 3 poor readers reading approximately one to two years below grade level; the results of the assessment were compared to the performance abilities of 33 normal readers in grade 3 as obtained from an earlier study that employed the same approach and stimuli. The purpose of the study was to examine the strategies employed by poor readers in the acquisition of new reading vocabulary. Students were randomly assigned to either a treatment group (Mixed Phonics Explicit), or to a control group (Phonics Implicit). Subjects in the Mixed Phonics Explicit groups received explicit letter/sound correspondence training. Subjects in the Phonics Implicit group were asked to re-read the presented pseudo-words, receiving corrective feedback when necessary. The stimuli on which the subjects were trained involved a list of six pseudo-words presented in sentences as surnames. The training involved a teaching and test format on each trial for a total of six trials or until criterion had been reached. The results suggested that both normal and poor readers engage in visual learning and verbal coding when acquiring new reading vocabulary. However, poor readers appear to engage in less verbal coding than normal readers. Between group comparisons showed no difference between poor and normal readers in trials and errors to criterion in the visual recognition memory measure. However, normal readers performed significantly better in reading their visual recognition choices.
Resumo:
This study compared the relative effectiveness of two computerized remedial reading programs in improving the reading word recognition, rate, and comprehension of adolescent readers demonstrating significant and longstanding reading difficulties. One of the programs involved was Autoskill Component Reading Subskills Program, which provides instruction in isolated letters, syllables, and words, to a point of rapid automatic responding. This program also incorporates reading disability subtypes in its approach. The second program, Read It Again. Sam, delivers a repeated reading strategy. The study also examined the feasibility of using peer tutors in association with these two programs. Grade 9 students at a secondary vocational school who satisfied specific criteria with respect to cognitive and reading ability participated. Eighteen students were randomly assigned to three matched groups, based on prior screening on a battery of reading achievement tests. Two I I groups received training with one of the computer programs; the third group acted as a control and received the remedial reading program offered within the regular classroom. The groups met daily with a trained tutor for approximately 35 minutes, and were required to accumulate twenty hours of instruction. At the conclusion of the program, the pretest battery was repeated. No significant differences were found in the treatment effects of the two computer groups. Each of the two treatment groups was able to effect significantly improved reading word recognition and rate, relative to the control group. Comprehension gains were modest. The treatment groups demonstrated a significant gain, relative to the control group, on one of the three comprehension measures; only trends toward a gain were noted on the remaining two measures. The tutoring partnership appeared to be a viable alternative for the teacher seeking to provide individualized computerized remedial programs for adolescent unskilled readers. Both programs took advantage of computer technology in providing individualized drill and practice, instant feedback, and ongoing recordkeeping. With limited cautions, each of these programs was considered effective and practical for use with adolescent unskilled readers.
Resumo:
This study examined the effects that a training program in phonological awareness had on the early writing skills of children in a Grade One class in the Lincoln County Separate school system. The intent of the training program was to provide consistent and systematic practice in the manipulation of the phonological structure of language. The games and activities of the training program were related to a framework of developmental phonological skills and practised in a group setting during an unstructured period of the regular classroom schedule. The training program operated three days in a six-day cycle for approximately twenty minutes a day, from November until mid-March. All children were tested at the outset and conclusion of the study to determine level of functioning in letter identification, word recognition, verbal intelligence, phonological awareness and spelling. Results of the pre-tests and post-tests were compared to determine differences between the experimental and control groups over time. In addition, a systematic analysis of the children's writing looked at the development of the spelling of regular and irregular words. The results of this study provided strong support for the hypothesis that the treatment group would progress through the stages of early writing development more quickly than children without such training. On the basis of differences between the groups over time, it was evident that training in phonological awareness had a direct positive effect on the spelling of regular words for children during the early stages of writing. The training program did not have a significant effect on the spelling of irregular words. Test results evaluating phonological awareness indicated a significant difference within each group over time but no significance between the groups during the experimental period. It would appear that the results of these tests reflect maturational changes in the child rather than causal effects of the training program. Nor did the effects of the training program transfer significantly to other aspects of language. Although some of the hypotheses considered were not supported by the study, the results do indicate that children during the early stages of writing development can benefit from a training program in phonological awareness. The theoretical direction for effective programming as a result of this study is discussed. The educational implications of training phonological awareness concurrent to beginning efforts in writing are considered.
Resumo:
The present study explored processing strategies used by individuals when they begin to read c;l script. Stimuli were artificial words created from symbols and based on an alphabetic system. The words were.presented to Grade Nine and Ten students, with variations included in the difficulty of orthography and word familiarity, and then scores were recorded on the mean number of trials for defined learning variables. Qualitative findings revealed that subjects 1 earned parts of the visual a'nd auditory features of words prior to hooking up the visual stimulus to the word's name. Performance measures-which appear to affect the rate of learning were as follows: auditory short-term memory, auditory delayed short-term memory, visual delayed short- term memory, and word attack or decod~ng skills. Qualitative data emerging in verbal reports by the subjects revealed that strategies they pefceived to use were, graphic, phonetic decoding and word .reading.
Resumo:
This qualitative study stemmed from a concern of the perceived decline in students' reading motivation after the early years of schooling, which has been attributed to the disconnect between the media students are accustomed to using outside the classroom and the media they predominantly use within the classroom. This research documented the effectiveness of a digital children's literature program and a postreading multimedia program on eight grade 1 students' reading motivation, word recognition, and comprehension abilities. Eight students were given ten 25-minute sessions with the software program over 15 weeks. Preprogram, interim-program, and postprogram qualitative data were collected from students, teachers, and parents through questionnaires, interviews, standardized reading assessment tools, classroom observations, field notes, and student behaviour observation checklists. Findings are summarized into 3 themes. The motivational aspects and constructivist styles of instruction in the digital reading programs may have contributed to 5 student participants' increased participation in online storybook reading at home. Qualitative data revealed that the digital children's literature program and multimedia postreading activities seemed to have a positive influence on the majority of grade 1 student participants' reading motivation, word recognition, and listening comprehension skills. These findings suggest the promise of multimedia and Internet-based reading software programs in supporting students with reading andlor behavioural difficulties. In keeping with current educational initiatives and efforts, increased use of media literacy practices in the grade 1 curriculum is suggested.
Resumo:
Based on the theoretical framework of Dressler and Dziubalska-Kołaczyk (2006a,b), the Strong Morphonotactic Hypothesis will be tested. It assumes that phonotactics helps in decomposition of words into morphemes: if a certain sequence occurs only or only by default over a morpheme boundary and is thus a prototypical morphonotactic sequence, it should be processed faster and more accurately than a purely phonotactic sequence. Studies on typical and atypical first language acquisition in English, Lithuanian and Polish have shown significant differences between the acquisition of morphonotactic and phonotactic consonant clusters: Morphonotactic clusters are acquired earlier and faster by typically developing children, but are more problematic for children with Specific Language Impairment. However, results on acquisition are less clear for German. The focus of this contribution is whether and how German-speaking adults differentiate between morphonotactic and phonotactic consonant clusters and vowel-consonant sequences in visual word recognition. It investigates whether sub-lexical letter sequences are found faster when the target sequence is separated from the word stem by a morphological boundary than when it is a part of a morphological root. An additional factor that is addressed concerns the position of the target cluster in the word. Due to the bathtub effect, sequences in peripheral positions in a word are more salient and thus facilitate processing more than word-internal positions. Moreover, for adults the primacy effect most favors word-initial position (whereas for young children the recency effect most favors word- final position). Our study discusses effects of phonotactic vs. morphonotactic cluster status and of position within the word.
Resumo:
L’objectif principal de cette thèse était de quantifier et comparer l’effort requis pour reconnaître la parole dans le bruit chez les jeunes adultes et les personnes aînées ayant une audition normale et une acuité visuelle normale (avec ou sans lentille de correction de la vue). L’effort associé à la perception de la parole est lié aux ressources attentionnelles et cognitives requises pour comprendre la parole. La première étude (Expérience 1) avait pour but d’évaluer l’effort associé à la reconnaissance auditive de la parole (entendre un locuteur), tandis que la deuxième étude (Expérience 2) avait comme but d’évaluer l’effort associé à la reconnaissance auditivo-visuelle de la parole (entendre et voir le visage d’un locuteur). L’effort fut mesuré de deux façons différentes. D’abord par une approche comportementale faisant appel à un paradigme expérimental nommé double tâche. Il s’agissait d’une tâche de reconnaissance de mot jumelée à une tâche de reconnaissance de patrons vibro-tactiles. De plus, l’effort fut quantifié à l’aide d’un questionnaire demandant aux participants de coter l’effort associé aux tâches comportementales. Les deux mesures d’effort furent utilisées dans deux conditions expérimentales différentes : 1) niveau équivalent – c'est-à-dire lorsque le niveau du bruit masquant la parole était le même pour tous les participants et, 2) performance équivalente – c'est-à-dire lorsque le niveau du bruit fut ajusté afin que les performances à la tâche de reconnaissance de mots soient identiques pour les deux groupes de participant. Les niveaux de performance obtenus pour la tâche vibro-tactile ont révélé que les personnes aînées fournissent plus d’effort que les jeunes adultes pour les deux conditions expérimentales, et ce, quelle que soit la modalité perceptuelle dans laquelle les stimuli de la parole sont présentés (c.-à.-d., auditive seulement ou auditivo-visuelle). Globalement, le ‘coût’ associé aux performances de la tâche vibro-tactile était au plus élevé pour les personnes aînées lorsque la parole était présentée en modalité auditivo-visuelle. Alors que les indices visuels peuvent améliorer la reconnaissance auditivo-visuelle de la parole, nos résultats suggèrent qu’ils peuvent aussi créer une charge additionnelle sur les ressources utilisées pour traiter l’information. Cette charge additionnelle a des conséquences néfastes sur les performances aux tâches de reconnaissance de mots et de patrons vibro-tactiles lorsque celles-ci sont effectuées sous des conditions de double tâche. Conformément aux études antérieures, les coefficients de corrélations effectuées à partir des données de l’Expérience 1 et de l’Expérience 2 soutiennent la notion que les mesures comportementales de double tâche et les réponses aux questionnaires évaluent différentes dimensions de l’effort associé à la reconnaissance de la parole. Comme l’effort associé à la perception de la parole repose sur des facteurs auditifs et cognitifs, une troisième étude fut complétée afin d’explorer si la mémoire auditive de travail contribue à expliquer la variance dans les données portant sur l’effort associé à la perception de la parole. De plus, ces analyses ont permis de comparer les patrons de réponses obtenues pour ces deux facteurs après des jeunes adultes et des personnes aînées. Pour les jeunes adultes, les résultats d’une analyse de régression séquentielle ont démontré qu’une mesure de la capacité auditive (taille de l’empan) était reliée à l’effort, tandis qu’une mesure du traitement auditif (rappel alphabétique) était reliée à la précision avec laquelle les mots étaient reconnus lorsqu’ils étaient présentés sous les conditions de double tâche. Cependant, ces mêmes relations n’étaient pas présentes dans les données obtenues pour le groupe de personnes aînées ni dans les données obtenues lorsque les tâches de reconnaissance de la parole étaient effectuées en modalité auditivo-visuelle. D’autres études sont nécessaires pour identifier les facteurs cognitifs qui sous-tendent l’effort associé à la perception de la parole, et ce, particulièrement chez les personnes aînées.