8 resultados para word recognition

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present study investigated the effects of using an assistive software homophone tool on the assisted proofreading performance and unassisted basic skills of secondary-level students with reading difficulties. Students aged 13 to 15 years proofread passages for homophonic errors under three conditions: with the homophone tool, with homophones highlighted only, or with no help. The group using the homophone tool significantly outperformed the other two groups on assisted proofreading and outperformed the others on unassisted spelling, although not significantly. Remedial (unassisted) improvements in automaticity of word recognition, homophone proofreading, and basic reading were found over all groups. Results elucidate the differential contributions of each function of the homophone tool and suggest that with the proper training, assistive software can help not only students with diagnosed disabilities but also those with generally weak reading skills.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background
Learning to read is a key goal during primary school: reading difficulties may curtail children’s learning trajectories. Controversy remains regarding what types of interventions are effective for children at risk for academic failure, such as children in disadvantaged areas. We present data from a complex intervention to test the hypothesis that phonic skills and word recognition abilities are a pivotal and specific causal mechanism for the development of reading skills in children at risk for poorer literacy outcomes.
Method
Over 500 pupils across 16 primary schools took part in a Cluster Randomised Controlled Trial from school year 1 to year 3. Schools were randomly allocated to the intervention or the control arm. The intervention involved a literacy-rich after-school programme. Children attending schools in the control arm of the study received the curriculum normally provided. Children in both arms completed batteries of language, phonic skills, and reading tests every year. We used multilevel mediation models to investigate mediating processes between intervention and outcomes.
Findings
Children who took part in the intervention displayed improvements in reading skills compared to those in the control arm. Results indicated a significant indirect effect of the intervention via phonics encoding.
Discussion
The results suggest that the intervention was effective in improving reading abilities of children at risk, and this effect was mediated by improving children’s phonic skills. This has relevance for designing interventions aimed at improving literacy skills of children exposed to socio-economic disadvantage. Results also highlight the importance of methods to investigate causal pathways from intervention to outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Taking in recent advances in neuroscience and digital technology, Gander and Garland assess the state of the inter-arts in America and the Western world, exploring and questioning the primacy of affect in an increasingly hypertextual everyday environment. In this analysis they signal a move beyond W. J. T. Mitchell’s coinage of the ‘imagetext’ to an approach that centres the reader-viewer in a recognition, after John Dewey, of ‘art as experience’. New thinking in cognitive and computer sciences about the relationship between the body and the mind challenges any established definitions of ‘embodiment’, ‘materiality’, ‘virtuality’ and even ‘intelligence, they argue, whilst ‘Extended Mind Theory’, they note, marries our cognitive processes with the material forms with which we engage, confirming and complicating Marshall McLuhan’s insight, decades ago, that ‘all media are “extensions of man”’. In this chapter, Gander and Garland open paths and suggest directions into understandings and critical interpretations of new and emerging imagetext worlds and experiences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.