32 resultados para hand-drawn visual language recognition
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Resumo:
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.
Resumo:
In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.
Resumo:
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Resumo:
This article discusses the relationship between three language communities in Europe with variant levels of official recognition, namely Kashub, Sorb, and Silesian, and the institutions of their host states as regards their respective use, promotion, and revital-ization. Most language communities across the world campaign for recognition within a geographic/political region, or on the basis of a historic/group identity to ensure their language's use and status. The examples discussed here illustrate that language recognition and policies resulting therefrom and promoting official monolin-gualism strengthen the symbolic status of the language but contribute little to the functionality of language communities outside the area. As this article illustrates, in increasingly multilingual societies, language policies cut off its speakers from the political, economic, and social opportunities accessible through the medium of languages that lack official recognition locally. © 2014 Taylor & Francis Group, LLC.
Resumo:
Recent studies suggested that the control of hand movements in catching involves continuous vision-based adjustments. More insight into these adjustments may be gained by examining the effects of occluding different parts of the ball trajectory. Here, we examined the effects of such occlusion on lateral hand movements when catching balls approaching from different directions, with the occlusion conditions presented in blocks or in randomized order. The analyses showed that late occlusion only had an effect during the blocked presentation, and early occlusion only during the randomized presentation. During the randomized presentation movement biases were more leftward if the preceding trial was an early occlusion trial. The effect of early occlusion during the randomized presentation suggests that the observed leftward movement bias relates to the rightward visual acceleration inherent to the ball trajectories used, while its absence during the blocked presentation seems to reflect trial-by-trial adaptations in the visuomotor gain, reminiscent of dynamic gain control in the smooth pursuit system. The movement biases during the late occlusion block were interpreted in terms of an incomplete motion extrapolation--a reduction of the velocity gain--caused by the fact that participants never saw the to-be-extrapolated part of the ball trajectory. These results underscore that continuous movement adjustments for catching do not only depend on visual information, but also on visuomotor adaptations based on non-visual information.
Resumo:
International exhibitions were greatly responsible for the modernization of western society. The motive for these events was based on the possibility of enhancing the country’s international status abroad. The genesis of world exhibitions came from the conviction that humanity as a whole would improve the continual flow of new practical applications, the development of modern communication techniques and the social need for a medium that could acquaint the general public with changes in technology, economy and society .
Since the first national industrial exhibitions in Paris during the eighteenth century and especially starting from the first Great Exhibition in London’s Hyde Park in 1851 these international events spread steadily all over Europe and the United States, to reach Latin America in the beginnings of the twentieth century . The work of professionals such as Daniel Burnham, Werner Hegemann and Elbert Peets made the relation between exhibitions and urban transformation a much more connected one, setting a precedent for subsequent exhibitions.
In Buenos Aires, the celebration of the centennial of independence from Spain in 1910 had many meanings and repercussions. A series of factors allowed for a moment of change in the city. Official optimism, economical progress, inequality and social conflict made of this a suitable time for transformation. With the organization of the Exposición Internacional the government had, among others, one specific aim: to achieve a network of visual tools to set the feeling of belonging and provide an identity for the mixture of cultures that populated the city of Buenos Aires at the time. Another important objective of the government was to put Buenos Aires at the level of European cities.
Foreign professionals had a great influence in the conceptual and factual shaping of the exhibition and in the subsequent changes caused in the urban condition. The exhibition had an important role in the ways of thinking the city and in the leisure ideas it introduced. The exhibition, as a didactic tool, worked as a precedent for conceiving leisure spaces in the future. Urban and landscape planners such as Joseph Bouvard and Charles Thays were instrumental in great part of the design of the Exhibition, but it was not only the architects and designers who shaped the identity of the fair. Other visitors such as Jules Huret or Georges Clemenceau were responsible for giving the city an international image it did not previously have.
This paper will explore on the one hand the significance of the exhibition of 1910 for the shaping of the city and its image; and on the other hand, the role of foreign professionals and the reach these influences had.
Resumo:
This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of different hardware languages in a way that harnesses the best features of each language. This is illustrated in this paper by the integration of two hardware languages in the form of HIDE: a structured hardware language which provides more abstract and elegant hardware descriptions and compositions than are possible in traditional hardware description languages such as VHDL or Verilog, and Handel-C: an ANSI C-like hardware language which allows software and hardware engineers alike to target FPGAs from high-level algorithmic descriptions. On the one hand, HIDE has proven to be very successful in the description and generation of highly optimised parameterisable FPGA circuits from geometric descriptions. On the other hand, Handel-C has also proven to be very successful in the rapid design and prototyping of FPGA circuits from algorithmic application descriptions. The proposed integrated framework hence harnesses HIDE for the generation of highly optimised circuits for regular parts of algorithms, while Handel-C is used as a top-level design language from which HIDE functionality is dynamically invoked. The overall message of this paper posits that there need not be an exclusive choice between different hardware design flows. Rather, an integrated framework where different design flows can seamlessly interoperate should be adopted. Although the idea might seem simple prima facie, it could have serious implications on the design of future generations of hardware languages.
Resumo:
This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.
Resumo:
We present a novel approach to goal recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure called a Goal Graph is constructed to represent the observed actions, the state of the world, and the achieved goals as well as various connections between these nodes at consecutive time steps. Then, the Goal Graph is analysed at each time step to recognise those partially or fully achieved goals that are consistent with the actions observed so far. The Goal Graph analysis also reveals valid plans for the recognised goals or part of these goals. Our approach to goal recognition does not need a plan library. It does not suffer from the problems in the acquisition and hand-coding of large plan libraries, neither does it have the problems in searching the plan space of exponential size. We describe two algorithms for Goal Graph construction and analysis in this paradigm. These algorithms are both provably sound, polynomial-time, and polynomial-space. The number of goals recognised by our algorithms is usually very small after a sequence of observed actions has been processed. Thus the sequence of observed actions is well explained by the recognised goals with little ambiguity. We have evaluated these algorithms in the UNIX domain, in which excellent performance has been achieved in terms of accuracy, efficiency, and scalability.
Resumo:
Purpose
– Information science has been conceptualized as a partly unreflexive response to developments in information and computer technology, and, most powerfully, as part of the gestalt of the computer. The computer was viewed as an historical accident in the original formulation of the gestalt. An alternative, and timely, approach to understanding, and then dissolving, the gestalt would be to address the motivating technology directly, fully recognizing it as a radical human construction. This paper aims to address the issues.
Design/methodology/approach
– The paper adopts a social epistemological perspective and is concerned with collective, rather than primarily individual, ways of knowing.
Findings
– Information technology tends to be received as objectively given, autonomously developing, and causing but not itself caused, by the language of discussions in information science. It has also been characterized as artificial, in the sense of unnatural, and sometimes as threatening. Attitudes to technology are implied, rather than explicit, and can appear weak when articulated, corresponding to collective repression.
Research limitations/implications
– Receiving technology as objectively given has an analogy with the Platonist view of mathematical propositions as discovered, in its exclusion of human activity, opening up the possibility of a comparable critique which insists on human agency.
Originality/value
– Apprehensions of information technology have been raised to consciousness, exposing their limitations.