903 resultados para hand-drawn visual language recognition
Resumo:
New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.
Resumo:
New low cost sensors and the new open free libraries for 3D image processing are permitting to achieve important advances for robot vision applications such as tridimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a method to recognize the human hand and to track the fingers is proposed. This new method is based on point clouds from range images, RGBD. It does not require visual marks, camera calibration, environment knowledge and complex expensive acquisition systems. Furthermore, this method has been implemented to create a human interface in order to move a robot hand. The human hand is recognized and the movement of the fingers is analyzed. Afterwards, it is imitated from a Barret hand, using communication events programmed from ROS.
Resumo:
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.
Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier
Resumo:
Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a single camera and artificial neural networks is difficult. Instead it would be simpler to learn position as a function of the visual input. Usually when learning images, an intermediate representation is employed. An appropriate starting point for biologically plausible image representation is the complex cells of the visual cortex, which have invariance properties that appear useful for localisation. The effectiveness for localisation of two different complex cell models are evaluated. Finally the ability of a simple neural network with single shot learning to recognise these representations and localise a robot is examined.
Resumo:
This present paper reviews the reliability and validity of visual analogue scales (VAS) in terms of (1) their ability to predict feeding behaviour, (2) their sensitivity to experimental manipulations, and (3) their reproducibility. VAS correlate with, but do not reliably predict, energy intake to the extent that they could be used as a proxy of energy intake. They do predict meal initiation in subjects eating their normal diets in their normal environment. Under laboratory conditions, subjectively rated motivation to eat using VAS is sensitive to experimental manipulations and has been found to be reproducible in relation to those experimental regimens. Other work has found them not to be reproducible in relation to repeated protocols. On balance, it would appear, in as much as it is possible to quantify, that VAS exhibit a good degree of within-subject reliability and validity in that they predict with reasonable certainty, meal initiation and amount eaten, and are sensitive to experimental manipulations. This reliability and validity appears more pronounced under the controlled (but more arti®cial) conditions of the laboratory where the signal : noise ratio in experiments appears to be elevated relative to real life. It appears that VAS are best used in within-subject, repeated-measures designs where the effect of different treatments can be compared under similar circumstances. They are best used in conjunction with other measures (e.g. feeding behaviour, changes in plasma metabolites) rather than as proxies for these variables. New hand-held electronic appetite rating systems (EARS) have been developed to increase reliability of data capture and decrease investigator workload. Recent studies have compared these with traditional pen and paper (P&P) VAS. The EARS have been found to be sensitive to experimental manipulations and reproducible relative to P&P. However, subjects appear to exhibit a signi®cantly more constrained use of the scale when using the EARS relative to the P&P. For this reason it is recommended that the two techniques are not used interchangeably
Resumo:
Probabilistic robotics, most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainly to accompany observations of the environment. This paper describes how uncertainly can be characterised for a vision system that locates coloured landmark in a typical laboratory environment. The paper describes a model of the uncertainly in segmentation, the internal camera model and the mounting of the camera on the robot. It =plains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainly model,
Resumo:
In this paper we present a novel algorithm for localization during navigation that performs matching over local image sequences. Instead of calculating the single location most likely to correspond to a current visual scene, the approach finds candidate matching locations within every section (subroute) of all learned routes. Through this approach, we reduce the demands upon the image processing front-end, requiring it to only be able to correctly pick the best matching image from within a short local image sequence, rather than globally. We applied this algorithm to a challenging downhill mountain biking visual dataset where there was significant perceptual or environment change between repeated traverses of the environment, and compared performance to applying the feature-based algorithm FAB-MAP. The results demonstrate the potential for localization using visual sequences, even when there are no visual features that can be reliably detected.
Resumo:
As teacher/researchers interested in the pursuit of socially-just outcomes in early childhood education, the form and function of language occupies a special position in our work. We believe that mastering a range of literacy competences includes not only the technical skills for learning, but also the resources for viewing and constructing the world (Freire and Macdeo, 1987). Rather than seeing knowledge about language as the accumulation of technical skills alone, the viewpoint to which we subscribe treats knowledge about language as a dialectic that evolves from, is situated in, and contributes to a social arena (Halliday, 1978). We do not shy away from this position just because children are in the early years of schooling. In ‘Playing with Grammar’, we focus on the Foundation to Year 2 grouping, in line with the Australian Curriculum, Assessment and Reporting Authority’s (hereafter ACARA) advice on the ‘nature of learners’ (ACARA, 2013). With our focus on the early years of schooling comes our acknowledgement of the importance and complexity of play. At a time where accountability in education has moved many teachers to a sense of urgency to prove language and literacy achievement (Genishi and Dyson, 2009), we encourage space to revisit what we know about literature choices and learning experiences and bring these together to facilitate language learning. We can neither ignore, nor overemphasise, the importance of play for the development of language through: the opportunities presented for creative use and practice; social interactions for real purposes; and, identifying and solving problems in the lives of young children (Marsh and Hallet, 2008). We argue that by engaging young children in opportunities to play with language we are ultimately empowering them to be active in their language learning and in the process fostering a love of language and the intricacies it holds. Our goal in this publication is to provide a range of highly practical strategies for scaffolding young children through some of the Content Descriptions from the Australian Curriculum English Version 5.0, hereafter AC:E V5.0 (ACARA, 2013). This recently released curriculum offers a new theoretical approach to building children’s knowledge about language. The AC:E V5.0 uses selected traditional terms through an approach developed in systemic functional linguistics (see Halliday and Matthiessen, 2004) to highlight the dynamic forms and functions of multimodal language in texts. For example, the following statement, taken from the ‘Language: Knowing about the English language’ strand states: English uses standard grammatical terminology within a contextual framework, in which language choices are seen to vary according to the topics at hand, the nature and proximity of the relationships between the language users, and the modalities or channels of communication available (ACARA, 2013). Put simply, traditional grammar terms are used within a functional framework made up of field, tenor, and mode. An understanding of genre is noted with the reference to a ‘contextual framework’. The ‘topics at hand’ concern the field or subject matter of the text. The ‘relationships between the language users’ is a description of tenor. There is reference to ‘modalities’, such as spoken, written or visual text. We posit that this innovative approach is necessary for working with contemporary multimodal and cross-cultural texts (see Exley and Mills, 2012). We believe there is enormous power in using literature to expose children to the richness of language and in turn develop language and literacy skills. Taking time to look at language patterns within actual literature is a pathway to ‘…capture interest, stir the imagination and absorb the [child]’ into the world of language and literacy (Saxby, 1993, p. 55). In the following three sections, we have tried to remain faithful to our interpretation of the AC:E V5.0 Content Descriptions without giving an exhaustive explanation of the grammatical terms. Other excellent tomes, such as Derewianka (2011), Humphrey, Droga and Feez (2012), and Rossbridge and Rushton (2011) provide these more comprehensive explanations as does the AC:E V5.0 Glossary. We’ve reproduced some of the AC:E V5.0 glossary at the end of this publication. Our focus is on the structure and unfolding of the learning experiences. We outline strategies for working with children in Foundation, Year 1 and Year 2 by providing some demonstration learning experiences based on texts we’ve selected, but maintain that the affordances of these strategies will only be realised when teaching and learning is purposively tied to authentic projects in local contexts. We strongly encourage you not to use only the resource texts we’ve selected, but to capitalise upon your skill for identifying the language features in the texts you and the children are studying and adapt some of the strategies we have outlined. Each learning experience is connected to one of the Content Descriptions from the AC:E V5.0 and contains an experience specific purpose, a suggested resource text and a sequence for the experience that always commences with an orientation to text followed by an examination of a particular grammatical resource. We expect that each of these learning experiences will take a couple if not a few teaching episodes to work through, especially if children are meeting a concept for the first time. We hope you use as much, or as little, of each experience as is needed. Our plans allow for focused discussion, shared exploration and opportunities to revisit the same text for the purpose of enhancing meaning making. We do not want the teaching of grammar to slip into a crisis of irrelevance or to be seen as a series of worksheet drills with finite answers. Strategies for effective practice, however, have much portability. We are both very keen to hear from teachers who are adopting and adapting these learning experiences in their classrooms. Please email us on b.exley@qut.edu.au or lkervin@uow.edu.au. We’d love to continue the conversation with you over time.
Resumo:
The integration of separate, yet complimentary, cortical pathways appears to play a role in visual perception and action when intercepting objects. The ventral system is responsible for object recognition and identification, while the dorsal system facilitates continuous regulation of action. This dual-system model implies that empirically manipulating different visual information sources during performance of an interceptive action might lead to the emergence of distinct gaze and movement pattern profiles. To test this idea, we recorded hand kinematics and eye movements of participants as they attempted to catch balls projected from a novel apparatus that synchronised or de-synchronised accompanying video images of a throwing action and ball trajectory. Results revealed that ball catching performance was less successful when patterns of hand movements and gaze behaviours were constrained by the absence of advanced perceptual information from the thrower's actions. Under these task constraints, participants began tracking the ball later, followed less of its trajectory, and adapted their actions by initiating movements later and moving the hand faster. There were no performance differences when the throwing action image and ball speed were synchronised or de-synchronised since hand movements were closely linked to information from ball trajectory. Results are interpreted relative to the two-visual system hypothesis, demonstrating that accurate interception requires integration of advanced visual information from kinematics of the throwing action and from ball flight trajectory.
Resumo:
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).
Resumo:
This new volume, Exploring with Grammar in the Primary Years (Exley, Kevin & Mantei, 2014), follows on from Playing with Grammar in the Early Years (Exley & Kervin, 2013). We extend our thanks to the ALEA membership for their take up of the first volume and the vibrant conversations around our first attempt at developing a pedagogy for the teaching of grammar in the early years. Your engagement at locally held ALEA events has motivated us to complete this second volume and reassert our interest in the pursuit of socially-just outcomes in the primary years. As noted in Exley and Kervin (2013), we believe that mastering a range of literacy competences includes not only the technical skills for learning, but also the resources for viewing and constructing the world (Freire and Macdeo, 1987). Rather than seeing knowledge about language as the accumulation of technical skills alone, the viewpoint to which we subscribe treats knowledge about language as a dialectic that evolves from, is situated in, and contributes to active participation within a social arena (Halliday, 1978). We acknowledge that to explore is to engage in processes of discovery as we look closely and examine the opportunities before us. As such, we draw on Janks’ (2000; 2014) critical literacy theory to underpin many of the learning experiences in this text. Janks (2000) argues that effective participation in society requires knowledge about how the power of language promotes views, beliefs and values of certain groups to the exclusion of others. Powerful language users can identify not only how readers are positioned by these views, but also the ways these views are conveyed through the design of the text, that is, the combination of vocabulary, syntax, image, movement and sound. Similarly, powerful designers of texts can make careful modal choices in written and visual design to promote certain perspectives that position readers and viewers in new ways to consider more diverse points of view. As the title of our text suggests, our activities are designed to support learners in exploring the design of texts to achieve certain purposes and to consider the potential for the sharing of their own views through text production. In Exploring with Grammar in the Primary Years, we focus on the Year 3 to Year 6 grouping in line with the Australian Curriculum, Assessment and Reporting Authority’s (hereafter ACARA) advice on the ‘nature of learners’ (ACARA, 2014). Our goal in this publication is to provide a range of highly practical strategies for scaffolding students’ learning through some of the Content Descriptions from the Australian Curriculum: English Version 7.2, hereafter AC:E (ACARA, 2014). We continue to express our belief in the power of using whole texts from a range of authentic sources including high quality children’s literature, the internet, and examples of community-based texts to expose students to the richness of language. Taking time to look at language patterns within actual texts is a pathway to ‘…capture interest, stir the imagination and absorb the [child]’ into the world of language and literacy (Saxby, 1993, p. 55). It is our intention to be more overt this time and send a stronger message that our learning experiences are simply ‘sample’ activities rather than a teachers’ workbook or a program of study to be followed. We’re hoping that teachers and students will continue to explore their bookshelves, the internet and their community for texts that provide powerful opportunities to engage with language-based learning experiences. In the following three sections, we have tried to remain faithful to our interpretation of the AC:E Content Descriptions without giving an exhaustive explanation of the grammatical terms. This recently released curriculum offers a new theoretical approach to building students’ knowledge about language. The AC:E uses selected traditional terms through an approach developed in systemic functional linguistics (see Halliday and Matthiessen, 2004) to highlight the dynamic forms and functions of multimodal language in texts. For example, the following statement, taken from the ‘Language: Knowing about the English language’ strand states: English uses standard grammatical terminology within a contextual framework, in which language choices are seen to vary according to the topics at hand, the nature and proximity of the relationships between the language users, and the modalities or channels of communication available (ACARA, 2014). Put simply, traditional grammar terms are used within a functional framework made up of field, tenor, and mode. An understanding of genre is noted with the reference to a ‘contextual framework’. The ‘topics at hand’ concern the field or subject matter of the text. The ‘relationships between the language users’ is a description of tenor. There is reference to ‘modalities’, such as spoken, written or visual text. We posit that this innovative approach is necessary for working with contemporary multimodal and cross-cultural texts (see Exley & Mills, 2012). Other excellent tomes, such as Derewianka (2011), Humphrey, Droga and Feez (2012), and Rossbridge and Rushton (2011) provide more comprehensive explanations of this unique metalanguage, as does the AC:E Glossary. We’ve reproduced some of the AC:E Glossary at the end of this publication. We’ve also kept the same layout for our learning experiences, ensuring that our teacher notes are not only succinct but also prudent in their placement. Each learning experience is connected to a Content Description from the AC:E and contains an experience with an identified purpose, suggested resource text and a possible sequence for the experience that always commences with an orientation to text followed by an examination of a particular grammatical resource. Our plans allow for focused discussion, shared exploration and opportunities to revisit the same text for the purpose of enhancing meaning making. Some learning experiences finish with deconstruction of a stimulus text while others invite students to engage in the design of new texts. We encourage you to look for opportunities in your own classrooms to move from text deconstruction to text design. In this way, students can express not only their emerging grammatical understandings, but also the ways they might position readers or viewers through the creation of their own texts. We expect that each of these learning experiences will vary in the time taken. Some may indeed take a couple if not a few teaching episodes to work through, especially if students are meeting a concept or a pedagogical strategy for the first time. We hope you use as much, or as little, of each experience as is needed for your students. We do not want the teaching of grammar to slip into a crisis of irrelevance or to be seen as a series of worksheet drills with finite answers. We firmly believe that strategies for effective deconstruction and design practice, however, have much portability. We three are very keen to hear from teachers who are adopting and adapting these learning experiences in their classrooms. Please email us on b.exley@qut.edu.au, lkervin@uow.edu.au or jessicam@ouw.edu.au. We’d love to continue the conversation with you over time. Beryl Exley, Lisa Kervin & Jessica Mantei
Resumo:
Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.
Resumo:
Purpose Optical blur and ageing are known to affect driving performance but their effects on drivers' eye movements are poorly understood. This study examined the effects of optical blur and age on eye movement patterns and performance on the DriveSafe slide recognition test which is purported to predict fitness to drive. Methods Twenty young (27.1 ± 4.6 years) and 20 older (73.3 ± 5.7 years) visually normal drivers performed the DriveSafe under two visual conditions: best-corrected vision and with +2.00 DS blur. The DriveSafe is a Visual Recognition Slide Test that consists of brief presentations of static, real-world driving scenes containing different road users (pedestrians, bicycles and vehicles). Participants reported the types, relative positions and direction of travel of the road users in each image; the score was the number of correctly reported items (maximum score of 128). Eye movements were recorded while participants performed the DriveSafe test using a Tobii TX300 eye tracking system. Results There was a significant main effect of blur on DriveSafe scores (best-corrected: 114.9 vs blur: 93.2; p < 0.001). There was also a significant age and blur interaction on the DriveSafe scores (p < 0.001) such that the young drivers were more negatively affected by blur than the older drivers (reductions of 22% and 13% respectively; p < 0.001): with best-corrected vision, the young drivers performed better than the older drivers (DriveSafe scores: 118.4 vs 111.5; p = 0.001), while with blur, the young drivers performed worse than the older drivers (88.6 vs 95.9; p = 0.009). For the eye movement patterns, blur significantly reduced the number of fixations on road users (best-corrected: 5.1 vs blur: 4.5; p < 0.001), fixation duration on road users (2.0 s vs 1.8 s; p < 0.001) and saccade amplitudes (7.4° vs 6.7°; p < 0.001). A main effect of age on eye movements was also found where older drivers made smaller saccades than the young drivers (6.7° vs 7.4°; p < 0.001). Conclusions Blur reduced DriveSafe scores for both age groups and this effect was greater for the young drivers. The decrease in number of fixations and fixation duration on road users, as well as the reduction in saccade amplitudes under the blurred condition, highlight the difficulty experienced in performing the task in the presence of optical blur, which suggests that uncorrected refractive errors may have a detrimental impact on aspects of driving performance.