302 resultados para audio-visual systems
Resumo:
This paper considers the question of designing a fully image based visual servo control for a dynamic system. The work is motivated by the ongoing development of image based visual servo control of small aerial robotic vehicles. The observed targets considered are coloured blobs on a flat surface to which the normal direction is known. The theoretical framework is directly applicable to the case of markings on a horizontal floor or landing field. The image features used are a first order spherical moment for position and an image flow measurement for velocity. A fully non-linear adaptive control design is provided that ensures global stability of the closed-loop system. © 2005 IEEE.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
Visual servoing has been a viable method of robot manipulator control for more than a decade. Initial developments involved positionbased visual servoing (PBVS), in which the control signal exists in Cartesian space. The younger method, image-based visual servoing (IBVS), has seen considerable development in recent years. PBVS and IBVS offer tradeoffs in performance, and neither can solve all tasks that may confront a robot. In response to these issues, several methods have been devised that partition the control scheme, allowing some motions to be performed in the manner of a PBVS system, while the remaining motions are performed using an IBVS approach. To date, there has been little research that explores the relative strengths and weaknesses of these methods. In this paper we present such an evaluation. We have chosen three recent visual servo approaches for evaluation in addition to the traditional PBVS and IBVS approaches. We posit a set of performance metrics that measure quantitatively the performance of a visual servo controller for a specific task. We then evaluate each of the candidate visual servo methods for four canonical tasks with simulations and with experiments in a robotic work cell.
Resumo:
Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.
Resumo:
Process-aware information systems, ranging from generic workflow systems to dedicated enterprise information systems, use work-lists to offer so-called work items to users. In real scenarios, users can be confronted with a very large number of work items that stem from multiple cases of different processes. In this jungle of work items, users may find it hard to choose the right item to work on next. The system cannot autonomously decide which is the right work item, since the decision is also dependent on conditions that are somehow outside the system. For instance, what is “best” for an organisation should be mediated with what is “best” for its employees. Current work-list handlers show work items as a simple sorted list and therefore do not provide much decision support for choosing the right work item. Since the work-list handler is the dominant interface between the system and its users, it is worthwhile to provide an intuitive graphical interface that uses contextual information about work items and users to provide suggestions about prioritisation of work items. This paper uses the so-called map metaphor to visualise work items and resources (e.g., users) in a sophisticated manner. Moreover, based on distance notions, the work-list handler can suggest the next work item by considering different perspectives. For example, urgent work items of a type that suits the user may be highlighted. The underlying map and distance notions may be of a geographical nature (e.g., a map of a city or office building), but may also be based on process designs, organisational structures, social networks, due dates, calendars, etc. The framework proposed in this paper is generic and can be applied to any process-aware information system. Moreover, in order to show its practical feasibility, the paper discusses a full-fledged implementation developed in the context of the open-source workflow environment YAWL, together with two real examples stemming from two very different scenarios. The results of an initial usability evaluation of the implementation are also presented, which provide a first indication of the validity of the approach.
Resumo:
Safety at railway level crossings (RLX) is one part of a wider picture of safety within the whole transport system. Governments, the rail industry and road organisations have used a variety of countermeasures for many years to improve RLX safety. New types of interventions are required in order to reduce the number of crashes and associated social costs at railway crossings. This paper presents the results of a large research program which aimed to assess the effectiveness of emerging Intelligent Transport Systems (ITS) interventions, both on-road and in-vehicle based, to improve the safety of car drivers at RLXs in Australia. The three most promising technologies selected from the literature review and focus groups were tested in an advanced driving simulator to provide a detailed assessment of their effects on driver behaviour. The three interventions were: (i) in-vehicle visual warning using a GPS/smartphone navigation-like system, (ii) in-vehicle audio warning and; (iii) on-road intervention known as valet system (warning lights on the road surface activated as a train approaches). The effects of these technologies on 57 participants were assessed in a systematic approach focusing on the safety of the intervention, effects on the road traffic around the crossings and driver’s acceptance of the technology. Given that the ITS interventions were likely to provide a benefit by improving the driver’s awareness of the crossing status in low visibility conditions, such conditions were investigated through curves in the track before arriving at the crossing. ITS interventions were also expected to improve driver behaviour at crossings with high traffic (blocking back issue), which were also investigated at active crossings. The key findings are: (i) interventions at passive crossings are likely to provide safety benefits; (ii) the benefits of ITS interventions on driver behaviour at active crossings are limited; (iii) the trialled ITS interventions did not show any issues in terms of driver distraction, driver acceptance or traffic delays; (iv) these interventions are easy to use, do not increase driver workload substantially; (v) participants’ intention to use the technology is high and; (vi) participants saw most value in succinct messages about approaching trains as opposed to knowing the RLX locations or the imminence of a collision with a train.
Resumo:
This paper presents a low-bandwidth multi-robot communication system designed to serve as a backup communication channel in the event a robot suffers a network device fault. While much research has been performed in the area of distributing network communication across multiple robots within a system, individual robots are still susceptible to hardware failure. In the past, such robots would simply be removed from service, and their tasks re-allocated to other members. However, there are times when a faulty robot might be crucial to a mission, or be able to contribute in a less communication intensive area. By allowing robots to encode and decode messages into unique sequences of DTMF symbols, called words, our system is able to facilitate continued low-bandwidth communication between robots without access to network communication. Our results have shown that the system is capable of permitting robots to negotiate task initiation and termination, and is flexible enough to permit a pair of robots to perform a simple turn taking task.
Resumo:
This thesis presents a new vision-based decision and control strategy for automated aircraft collision avoidance that can be realistically applied to the See and Avoid problem. The effectiveness of the control strategy positions the research as a major contribution toward realising the simultaneous operation of manned and unmanned aircraft within civilian airspace. Key developments include novel classical and visual predictive control frameworks, and a performance evaluation technique aligned with existing aviation practise and applicable to autonomous systems. The overall approach is demonstrated through experimental results on a small multirotor unmanned aircraft, and through high fidelity probabilistic simulation studies.
Resumo:
Intelligent Transport Systems (ITS) have the potential to substantially reduce the number of crashes caused by human errors at railway levels crossings. Such systems, however, will only exert an influence on driving behaviour if they are accepted by the driver. This study aimed at assessing driver acceptance of different ITS interventions designed to enhance driver behaviour at railway crossings. Fifty eight participants, divided into three groups, took part in a driving simulator study in which three ITS devices were tested: an in-vehicle visual ITS, an in-vehicle audio ITS, and an on-road valet system. Driver acceptance of each ITS intervention was assessed in a questionnaire guided by the Technology Acceptance Model and the Theory of Planned Behaviour. Overall, results indicated that the strongest intentions to use the ITS devices belonged to participants exposed to the road-based valet system at passive crossings. The utility of both models in explaining drivers’ intention to use the systems is discussed, with results showing greater support for the Theory of Planned Behaviour. Directions for future studies, along with strategies that target attitudes and subjective norms to increase drivers’ behavioural intentions, are also discussed.
Resumo:
Whisper Our Futures was an invited design proposal to produce a major public artwork for the State of Queensland’s 150th Anniversary Celebrations. It involved a network of 100 individual scrolling digital text boxes each with individual audio systems arranged together in a tessellated format. This form (specified by the originating brief) both mimicked the soaring gothic arches typical of Queensland cathedrals and was also suggestive of their stained glass windows. Each text module presented a message in both visual and audible forms for Queenslanders living 150 years hence - spoken both by the general public aw well as prominent figures. In this way the work was designed as a focus of future hope, historical reflection and inspiration to visitors to Queensland cathedrals throughout the entire year of celebrations (2009). The work was planned to premiere at Brisbane’s main Anglican Cathedral and then tour to nine other state cathedrals throughout 2009.---- Two staged proposals and budgets were invited throughout 2007. After the second successful proposal stage the State Premier and cabinet changed, ultimately leading the public art components to be dropped from the program. The proposal currently remains on file at the Queensland Premiers Office.
Resumo:
Generative media systems present an opportunity for users to leverage computational systems to make sense of complex media forms through interactive and collaborative experiences. Generative music and art are a relatively new phenomenon that use procedural invention as a creative technique to produce music and visual media. These kinds of systems present a range of affordances that can facilitate new kinds of relationships with music and media performance and production. Early systems have demonstrated the potential to provide access to collaborative ensemble experiences to users with little formal musical or artistic expertise. This paper examines the relational affordances of these systems evidenced by selected field data drawn from the Network Jamming Project. These generative performance systems enable access to unique ensemble with very little musical knowledge or skill and they further offer the possibility of unique interactive relationships with artists and musical knowledge through collaborative performance. In this presentation I will focus on demonstrating how these simulated experiences might lead to understandings that may be of educational and social benefit. Conference participants will be invited to jam in real time using virtual interfaces and to view video artifacts that demonstrate an interactive relationship with artists.
Resumo:
Current multimedia Web search engines still use keywords as the primary means to search. Due to the richness in multimedia contents, general users constantly experience some difficulties in formulating textual queries that are representative enough for their needs. As a result, query reformulation becomes part of an inevitable process in most multimedia searches. Previous Web query formulation studies did not investigate the modification sequences and thus can only report limited findings on the reformulation behavior. In this study, we propose an automatic approach to examine multimedia query reformulation using large-scale transaction logs. The key findings show that search term replacement is the most dominant type of modifications in visual searches but less important in audio searches. Image search users prefer the specified search strategy more than video and audio users. There is also a clear tendency to replace terms with synonyms or associated terms in visual queries. The analysis of the search strategies in different types of multimedia searching provides some insights into user’s searching behavior, which can contribute to the design of future query formulation assistance for keyword-based Web multimedia retrieval systems.