903 resultados para hand-drawn visual language recognition
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
This short film, created by David Megarrity and Luke Monsour, experimented within a short timeframe with the challenge of superimposition of hand-drawn backgrounds, non-verbal action, and a short, sharp shoot. The aim was also to find a single piece of standalone music that would act as an unedited soundtrack It won Best Queensland Film at the Woodford Film Festival in 2005, and was screened at Base-Court, Lausanne Switzerland in 2006, and the Westgarth Film Festival 2005. It was acquired by comedy website minimovie in 2007.
Resumo:
This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.
Resumo:
Understanding complex systems within the human body presents a unique challenge for medical engineers and health practitioners. One significant issue is the ability to communicate their research findings to audiences with limited medical knowledge or understanding of the behaviour and composition of such structures. Much of what is known about the human body is currently communicated through abstract representations which include raw data sets, hand drawn illustrations or cellular automata. The development of 3D Computer Graphics Animation has provided a new medium for communicating these abstract concepts to audiences in new ways. This paper presents an approach for the visualisation of human articular cartilage deterioration using 3D Computer Graphics Animation. The animated outcome of this research introduces the complex interior structure of human cartilage to audiences with limited medical engineering knowledge.
Resumo:
In this paper we demonstrate passive vision-based localization in environments more than two orders of magnitude darker than the current benchmark using a 100 webcam and a 500 camera. Our approach uses the camera’s maximum exposure duration and sensor gain to achieve appropriately exposed images even in unlit night-time environments, albeit with extreme levels of motion blur. Using the SeqSLAM algorithm, we first evaluate the effect of variable motion blur caused by simulated exposures of 132 ms to 10000 ms duration on localization performance. We then use actual long exposure camera datasets to demonstrate day-night localization in two different environments. Finally we perform a statistical analysis that compares the baseline performance of matching unprocessed greyscale images to using patch normalization and local neighbourhood normalization – the two key SeqSLAM components. Our results and analysis show for the first time why the SeqSLAM algorithm is effective, and demonstrate the potential for cheap camera-based localization systems that function across extreme perceptual change.
Resumo:
Process models are often used to visualize and communicate workflows to involved stakeholders. Unfortunately, process modeling notations can be complex and need specific knowledge to be understood. Storyboards, as a visual language to illustrate workflows as sequences of images, provide natural visualization features that allow for better communication, to provide insight to people from non-process modelling expert domains. This paper proposes a visualization approach using a 3D virtual world environment to visualize storyboards for business process models. A prototype was built to present its applicability via generating output with examples of five major process model patterns and two non-trivial use cases. Illustrative results for the approach show the promise of using a 3D virtual world to visualize complex process models in an unambiguous and intuitive manner.
Resumo:
We refer to an ongoing endeavour aimed to assist Indigenouscommunities in Australian in persisting their personal and cultural memories linked to temporally dynamic interactions in situ. The design enables Indigenous users to upload items they collect themselves (e.g. photographs, audio, video) using mobile phones,in their traditional lands into a topographical simulation; and, thento associate these items with their own hand-drawn markings inthe simulation. The design responds to the rich interconnectedness between Indigenous culture and the land and the need to converge spatial information technologies with practices that are not, inherently, conditioned by the geometries of the West. We propose that the design approach contributes to thinking about ways that mobile guides can respond to multiple realities and corporeal and affective phenomena.
Resumo:
For robots operating in outdoor environments, a number of factors, including weather, time of day, rough terrain, high speeds, and hardware limitations, make performing vision-based simultaneous localization and mapping with current techniques infeasible due to factors such as image blur and/or underexposure, especially on smaller platforms and low-cost hardware. In this paper, we present novel visual place-recognition and odometry techniques that address the challenges posed by low lighting, perceptual change, and low-cost cameras. Our primary contribution is a novel two-step algorithm that combines fast low-resolution whole image matching with a higher-resolution patch-verification step, as well as image saliency methods that simultaneously improve performance and decrease computing time. The algorithms are demonstrated using consumer cameras mounted on a small vehicle in a mixed urban and vegetated environment and a car traversing highway and suburban streets, at different times of day and night and in various weather conditions. The algorithms achieve reliable mapping over the course of a day, both when incrementally incorporating new visual scenes from different times of day into an existing map, and when using a static map comprising visual scenes captured at only one point in time. Using the two-step place-recognition process, we demonstrate for the first time single-image, error-free place recognition at recall rates above 50% across a day-night dataset without prior training or utilization of image sequences. This place-recognition performance enables topologically correct mapping across day-night cycles.
Resumo:
A collection of oral history recordings, photographs, hand drawn maps, videos and speech notes relating to the 2011 Queensland floods and the major flood event that occurred in Toowoomba and the Lockyer Valley region on 10 January 2011: a flash flood (described as an 'inland tsunami') which devastatingly took 21 human lives. The collection, amassed by Toowoomba-based journalist Amanda Gearing for her Master of Arts degree, includes 86 oral history recordings of flood survivors and rescuers in Spring Bluff, Murphys Creek, Toowoomba, Withcott, Postmans Ridge, Helidon, Carpendale and Grantham as well as digital photographs and videos taken by a number of those interviewed including those taken by Amanda Gearing and other locals. The interviews are very personal and powerful recollections of the experience of the flood event. Some recall feelings of fear and despair and tell of trauma and loss which continues well after the flood event. All are stories of resilience and hope, of rebuilding lives, of lessons learnt, and recommendations in order to avoid the same devastating results in future disasters.
Resumo:
This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.
Resumo:
Domain-invariant representations are key to addressing the domain shift problem where the training and test exam- ples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be di- rectly suitable for such a comparison, since some of the fea- tures may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and tar- get domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a stan- dard domain adaptation benchmark dataset
Resumo:
Este estudo trata da atual Política Nacional de Resíduos Sólidos, regulamentada pelo Decreto n 7.404/10, enfocando os mecanismos jurídicos garantidores da integração dos catadores de materiais recicláveis e reutilizáveis na responsabilidade compartilhada pelo ciclo de vida dos produtos, que historicamente tem um passado de exploração de trabalho e invisibilidade social. Com o objetivo de analisar as condições de aplicabilidade dos mecanismos presentes na Lei n 12.305/10 voltados para o reconhecimento social e ambiental, como também para a proteção legal dos direitos desse grupo social, iremos inicialmente esclarecer os aspectos conceituais basilares para a compreensão da temática das iniquidades sociais, bem como verificar a importância da utilização da teoria das necessidades humanas fundamentais, como sendo um instrumento adequado para a interpretação dessa forma de exclusão social. Ademais, este trabalho se propõe a discutir as principais correntes teóricas contemporâneas utilizadas no estudo da otimização da satisfação das necessidades humanas fundamentais, como também teorizar, filosoficamente, que tais necessidades funcionam como pressuposto de justificação para atribuição de direitos específicos e obrigações institucionais. Do ponto de vista metodológico, trata-se de uma pesquisa qualitativa, tendo sido realizado, de forma dedutiva, levantamentos de dados por meio de revisão bibliográfica envolvendo consultas a jornais, revistas, livros, dissertações, teses, projetos, leis, decretos e pesquisas via internet em sites institucionais. O método de procedimento adotado foi o descritivo-analítico, ressaltando-se ainda que, de forma indutiva, foi igualmente desenvolvida uma pesquisa de campo em duas cooperativas de reciclagem da cidade de Campina Grande-PB. Os estudos desenvolvidos revelaram que o grupo social em análise se enquadra no contexto de pessoas que necessitam de otimização para satisfação das necessidades fundamentais, havendo uma consistente e sustentável argumentação teórica nesse sentido. Concluiu-se que, apesar do compromisso expresso na Lei n 12.305/10, para com a valorização do trabalho dos catadores, deve ocorrer um esforço interpretativo dos mecanismos de inclusão social, empoderamento econômico e reconhecimento social e ambiental desta categoria. Foi igualmente concluído que as estratégias de integração dos catadores na responsabilidade compartilhada pelo ciclo de vida dos produtos, criadas pela legislação de resíduos sólidos, foram delineadas a partir do reconhecimento dos catadores pelo poder público na coleta seletiva e da inserção dos catadores na logística reversa, garantindo condições de mercado e acesso a recursos; contudo, o principal desafio parece ser o da inovação na própria forma de se pensar as políticas públicas para o setor.
Resumo:
The motor system responds to perturbations with reflexes, such as the vestibulo-ocular reflex or stretch reflex, whose gains adapt in response to novel and fixed changes in the environment, such as magnifying spectacles or standing on a tilting platform. Here we demonstrate a reflex response to shifts in the hand's visual location during reaching, which occurs before the onset of voluntary reaction time, and investigate how its magnitude depends on statistical properties of the environment. We examine the change in reflex response to two different distributions of visuomotor discrepancies, both of which have zero mean and equal variance across trials. Critically one distribution is task relevant and the other task irrelevant. The task-relevant discrepancies are maintained to the end of the movement, whereas the task-irrelevant discrepancies are transient such that no discrepancy exists at the end of the movement. The reflex magnitude was assessed using identical probe trials under both distributions. We find opposite directions of adaptation of the reflex response under these two distributions, with increased reflex magnitudes for task-relevant variability and decreased reflex magnitudes for task-irrelevant variability. This demonstrates modulation of reflex magnitudes in the absence of a fixed change in the environment, and shows that reflexes are sensitive to the statistics of tasks with modulation depending on whether the variability is task relevant or task irrelevant.
Resumo:
From its origins in the US electronics sector in the 1970s, technology roadmapping has been adapted (and adopted) widely, for many different innovation, strategy and policy applications. Communication is commonly cited as one of the key benefi ts of roadmapping, particularly in terms of the process that brings different organizational perspectives together, with the roadmap providing a common visual 'language'. There is signifi cant demand for methods that are agile, in the sense of being rapid, flexible and effective to apply, focused on strategic decisions and actions. 'Fast-start' roadmapping workshop techniques enable key stakeholders to address strategic issues efficiently using the visual structure of roadmaps to capture, discuss, prioritize, explore and communicate. This paper presents the learning from a set of five diverse applications of the fast-start approach in the Basque Country, which demonstrate the agility of the technique.
Resumo:
Among several others, the on-site inspection process is mainly concerned with finding the right design and specifications information needed to inspect each newly constructed segment or element. While inspecting steel erection, for example, inspectors need to locate the right drawings for each member and the corresponding specifications sections that describe the allowable deviations in placement among others. These information seeking tasks are highly monotonous, time consuming and often erroneous, due to the high similarity of drawings and constructed elements and the abundance of information involved which can confuse the inspector. To address this problem, this paper presents the first steps of research that is investigating the requirements of an automated computer vision-based approach to automatically identify “as-built” information and use it to retrieve “as-designed” project information for field construction, inspection, and maintenance tasks. Under this approach, a visual pattern recognition model was developed that aims to allow automatic identification of construction entities and materials visible in the camera’s field of view at a given time and location, and automatic retrieval of relevant design and specifications information.