903 resultados para hand-drawn visual language recognition
Resumo:
Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.
Resumo:
Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output. © 2012 IEEE.
Resumo:
The color change induced by triple hydrogen-bonding recognition between melamine and a cyanuric acid derivative grafted on the surface of gold nanoparticles can be used for reliable detection of melamine. Since such a color change can be readily seen by the naked eye, the method enables on-site and real-time detection of melamine in raw milk and infant formula even at a concentration as low as 2.5 ppb without the aid of any advanced instruments.
Resumo:
This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.
Resumo:
A system for visual recognition is described, with implications for the general problem of representation of knowledge to assist control. The immediate objective is a computer system that will recognize objects in a visual scene, specifically hammers. The computer receives an array of light intensities from a device like a television camera. It is to locate and identify the hammer if one is present. The computer must produce from the numerical "sensory data" a symbolic description that constitutes its perception of the scene. Of primary concern is the control of the recognition process. Control decisions should be guided by the partial results obtained on the scene. If a hammer handle is observed this should suggest that the handle is part of a hammer and advise where to look for the hammer head. The particular knowledge that a handle has been found combines with general knowledge about hammers to influence the recognition process. This use of knowledge to direct control is denoted here by the term "active knowledge". A descriptive formalism is presented for visual knowledge which identifies the relationships relevant to the active use of the knowledge. A control structure is provided which can apply knowledge organized in this fashion actively to the processing of a given scene.
Resumo:
Methods are presented (1) to partition or decompose a visual scene into the bodies forming it; (2) to position these bodies in three-dimensional space, by combining two scenes that make a stereoscopic pair; (3) to find the regions or zones of a visual scene that belong to its background; (4) to carry out the isolation of objects in (1) when the input has inaccuracies. Running computer programs implement the methods, and many examples illustrate their behavior. The input is a two-dimensional line-drawing of the scene, assumed to contain three-dimensional bodies possessing flat faces (polyhedra); some of them may be partially occluded. Suggestions are made for extending the work to curved objects. Some comparisons are made with human visual perception. The main conclusion is that it is possible to separate a picture or scene into the constituent objects exclusively on the basis of monocular geometric properties (on the basis of pure form); in fact, successful methods are shown.
Resumo:
A framework for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this framework is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.
Resumo:
Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.
Resumo:
Recent studies suggested that the control of hand movements in catching involves continuous vision-based adjustments. More insight into these adjustments may be gained by examining the effects of occluding different parts of the ball trajectory. Here, we examined the effects of such occlusion on lateral hand movements when catching balls approaching from different directions, with the occlusion conditions presented in blocks or in randomized order. The analyses showed that late occlusion only had an effect during the blocked presentation, and early occlusion only during the randomized presentation. During the randomized presentation movement biases were more leftward if the preceding trial was an early occlusion trial. The effect of early occlusion during the randomized presentation suggests that the observed leftward movement bias relates to the rightward visual acceleration inherent to the ball trajectories used, while its absence during the blocked presentation seems to reflect trial-by-trial adaptations in the visuomotor gain, reminiscent of dynamic gain control in the smooth pursuit system. The movement biases during the late occlusion block were interpreted in terms of an incomplete motion extrapolation--a reduction of the velocity gain--caused by the fact that participants never saw the to-be-extrapolated part of the ball trajectory. These results underscore that continuous movement adjustments for catching do not only depend on visual information, but also on visuomotor adaptations based on non-visual information.
Resumo:
International exhibitions were greatly responsible for the modernization of western society. The motive for these events was based on the possibility of enhancing the country’s international status abroad. The genesis of world exhibitions came from the conviction that humanity as a whole would improve the continual flow of new practical applications, the development of modern communication techniques and the social need for a medium that could acquaint the general public with changes in technology, economy and society .
Since the first national industrial exhibitions in Paris during the eighteenth century and especially starting from the first Great Exhibition in London’s Hyde Park in 1851 these international events spread steadily all over Europe and the United States, to reach Latin America in the beginnings of the twentieth century . The work of professionals such as Daniel Burnham, Werner Hegemann and Elbert Peets made the relation between exhibitions and urban transformation a much more connected one, setting a precedent for subsequent exhibitions.
In Buenos Aires, the celebration of the centennial of independence from Spain in 1910 had many meanings and repercussions. A series of factors allowed for a moment of change in the city. Official optimism, economical progress, inequality and social conflict made of this a suitable time for transformation. With the organization of the Exposición Internacional the government had, among others, one specific aim: to achieve a network of visual tools to set the feeling of belonging and provide an identity for the mixture of cultures that populated the city of Buenos Aires at the time. Another important objective of the government was to put Buenos Aires at the level of European cities.
Foreign professionals had a great influence in the conceptual and factual shaping of the exhibition and in the subsequent changes caused in the urban condition. The exhibition had an important role in the ways of thinking the city and in the leisure ideas it introduced. The exhibition, as a didactic tool, worked as a precedent for conceiving leisure spaces in the future. Urban and landscape planners such as Joseph Bouvard and Charles Thays were instrumental in great part of the design of the Exhibition, but it was not only the architects and designers who shaped the identity of the fair. Other visitors such as Jules Huret or Georges Clemenceau were responsible for giving the city an international image it did not previously have.
This paper will explore on the one hand the significance of the exhibition of 1910 for the shaping of the city and its image; and on the other hand, the role of foreign professionals and the reach these influences had.
Resumo:
Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu