Biblioteca Digital

902 resultados para Visual Performance

Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.

Incorporating visual information for spoken term detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.

Comparison of visual inspection and structural-health monitoring as bridge condition assessment methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the results of a research project aimed at examining the capabilities and challenges of two distinct but not mutually exclusive approaches to in-service bridge assessment: visual inspection and installed monitoring systems. In this study, the intended functionality of both approaches was evaluated on its ability to identify potential structural damage and to provide decision-making support. Inspection and monitoring are compared in terms of their functional performance, cost, and barriers (real and perceived) to implementation. Both methods have strengths and weaknesses across the metrics analyzed, and it is likely that a hybrid evaluation technique that adopts both approaches will optimize efficiency of condition assessment and ultimately lead to better decision making.

A performance that requires two people

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work was a performance piece that took place at West Space as part of the 'Conceted Efforts' exhibition. For three hours, Antoinette J. Citizen and Courtney Coombs listed activities that require two people. The resulting list then remained in the gallery as an installed object. The work explores the role of collaboration in art practice as well as society more broadly.

Functional activation during the Rapid Visual Information Processing task in a middle aged cohort: An fMRI study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Rapid Visual Information Processing (RVIP) task, a serial discrimination task where task performance believed to reflect sustained attention capabilities, is widely used in behavioural research and increasingly in neuroimaging studies. To date, functional neuroimaging research into the RVIP has been undertaken using block analyses, reflecting the sustained processing involved in the task, but not necessarily the transient processes associated with individual trial performance. Furthermore, this research has been limited to young cohorts. This study assessed the behavioural and functional magnetic resonance imaging (fMRI) outcomes of the RVIP task using both block and event-related analyses in a healthy middle aged cohort (mean age = 53.56 years, n = 16). The results show that the version of the RVIP used here is sensitive to changes in attentional demand processes with participants achieving a 43% accuracy hit rate in the experimental task compared with 96% accuracy in the control task. As shown by previous research, the block analysis revealed an increase in activation in a network of frontal, parietal, occipital and cerebellar regions. The event related analysis showed a similar network of activation, seemingly omitting regions involved in the processing of the task (as shown in the block analysis), such as occipital areas and the thalamus, providing an indication of a network of regions involved in correct trial performance. Frontal (superior and inferior frontal gryi), parietal (precuenus, inferior parietal lobe) and cerebellar regions were shown to be active in both the block and event-related analyses, suggesting their importance in sustained attention/vigilance. These networks and the differences between them are discussed in detail, as well as implications for future research in middle aged cohorts.

On Visual Detection of Highly-occluded Objects for Harvesting Automation in Horticulture

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developing accurate and reliable crop detection algorithms is an important step for harvesting automation in horticulture. This paper presents a novel approach to visual detection of highly-occluded fruits. We use a conditional random field (CRF) on multi-spectral image data (colour and Near-Infrared Reflectance, NIR) to model two classes: crop and background. To describe these two classes, we explore a range of visual-texture features including local binary pattern, histogram of oriented gradients, and learn auto-encoder features. The pro-posed methods are evaluated using hand-labelled images from a dataset captured on a commercial capsicum farm. Experimental results are presented, and performance is evaluated in terms of the Area Under the Curve (AUC) of the precision-recall curves.Our current results achieve a maximum performance of 0.81AUC when combining all of the texture features in conjunction with colour information.

A visual profile of Queensland Indigenous children

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose Little is known about the prevalence of refractive error, binocular vision, and other visual conditions in Australian Indigenous children. This is important given the association of these visual conditions with reduced reading performance in the wider population, which may also contribute to the suboptimal reading performance reported in this population. The aim of this study was to develop a visual profile of Queensland Indigenous children. Methods Vision testing was performed on 595 primary schoolchildren in Queensland, Australia. Vision parameters measured included visual acuity, refractive error, color vision, nearpoint of convergence, horizontal heterophoria, fusional vergence range, accommodative facility, AC/A ratio, visual motor integration, and rapid automatized naming. Near heterophoria, nearpoint of convergence, and near fusional vergence range were used to classify convergence insufficiency (CI). Results Although refractive error (Indigenous, 10%; non-Indigenous, 16%; p = 0.04) and strabismus (Indigenous, 0%; non-Indigenous, 3%; p = 0.03) were significantly less common in Indigenous children, CI was twice as prevalent (Indigenous, 10%; non-Indigenous, 5%; p = 0.04). Reduced visual information processing skills were more common in Indigenous children (reduced visual motor integration [Indigenous, 28%; non-Indigenous, 16%; p < 0.01] and slower rapid automatized naming [Indigenous, 67%; non-Indigenous, 59%; p = 0.04]). The prevalence of visual impairment (reduced visual acuity) and color vision deficiency was similar between groups. Conclusions Indigenous children have less refractive error and strabismus than their non-Indigenous peers. However, CI and reduced visual information processing skills were more common in this group. Given that vision screenings primarily target visual acuity assessment and strabismus detection, this is an important finding as many Indigenous children with CI and reduced visual information processing may be missed. Emphasis should be placed on identifying children with CI and reduced visual information processing given the potential effect of these conditions on school performance

Repeatable Condition-Invariant Visual Odometry for Sequence-Based Place Recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a vision-only system for place recognition in environments that are tra- versed at different times of day, when chang- ing conditions drastically affect visual appear- ance, and at different speeds, where places aren’t visited at a consistent linear rate. The ma- jor contribution is the removal of wheel-based odometry from the previously presented algo- rithm (SMART), allowing the technique to op- erate on any camera-based device; in our case a mobile phone. While we show that the di- rect application of visual odometry to our night- time datasets does not achieve a level of perfor- mance typically needed, the VO requirements of SMART are orthogonal to typical usage: firstly only the magnitude of the velocity is required, and secondly the calculated velocity signal only needs to be repeatable in any one part of the environment over day and night cycles, but not necessarily globally consistent. Our results show that the smoothing effect of motion constraints is highly beneficial for achieving a locally consis- tent, lighting-independent velocity estimate. We also show that the advantage of our patch-based technique used previously for frame recogni- tion, surprisingly, does not transfer to VO, where SIFT demonstrates equally good performance. Nevertheless, we present the SMART system us- ing only vision, which performs sequence-base place recognition in extreme low-light condi- tions where standard 6-DOF VO fails and that improves place recognition performance over odometry-less benchmarks, approaching that of wheel odometry.

Visual completion in an illusory figure

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The visual systems of humans and animals represent physical reality in a modified way, depending on the specific demands that the species in question has for survival. The ability to perceive visual illusions is found in independently evolved visual systems, from honeybees to humans. In humans, the ability emerges early, at the age of four months. Thus the perception of illusion is likely to reflect visual processes of fundamental importance for object perception in natural vision. The experiments reported in this thesis employed various modifications of the Kanizsa triangle, a drawn configuration composed of three black disks with missing sectors on a white background. The sectors appear to form the tips of a triangle. The visual system completes the physically empty area between the disks, generally called inducers, with giving the perception of an illusory triangle. The illusory triangle consists of an illusory surface bounded by illusory contours; the triangle appears brighter than and to lie above the background. If the sectors are coloured, the colour fills the illusory area, a phenomenon known as neon colour spreading . We investigated spatial limitations on the perception of Kanizsa-type illusions and how other stimuli and viewing parameters affected these limitations. We also studied complex configurations thick, bent, mobile and chromatic inducers - to determine whether illusions combining several attributes can be perceived. The results suggest that the visual system is highly effective in completing a percept. The perception of an illusory figure is spatially scale invariant when perceived at threshold. The processing time and the number of fixations modify the percept, making the perception of the illusion more probable in various viewing conditions. Furthermore, the fact that the illusion can be perceived when only one inducer is physically present at any given moment indicates the potential of single inducers. Apparently, modelling illusory figure perception will require a combination of low-level, local processes and higher-level integrative processes. Our studies with stimuli combining several attributes relevant to object perception demonstrate that the perception of an illusory figure is flexible and is maintained also when it contains colour and volume and when shown in movement. All in all, the results confirm the assumed importance of the visual processes related with the perception of illusory figures in everyday viewing. This is indicated by the variety of inducer modifications that can be made without destroying the percept. Furthermore, the illusion can acquire additional attributes from such modifications. Due to individual differences in the perception of illusory figures, universal values for absolute performance are not always meaningful, but stable trends and general relations do exist.

Visual search and eye movements: Studies of perceptual span

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In visual search one tries to find the currently relevant item among other, irrelevant items. In the present study, visual search performance for complex objects (characters, faces, computer icons and words) was investigated, and the contribution of different stimulus properties, such as luminance contrast between characters and background, set size, stimulus size, colour contrast, spatial frequency, and stimulus layout were investigated. Subjects were required to search for a target object among distracter objects in two-dimensional stimulus arrays. The outcome measure was threshold search time, that is, the presentation duration of the stimulus array required by the subject to find the target with a certain probability. It reflects the time used for visual processing separated from the time used for decision making and manual reactions. The duration of stimulus presentation was controlled by an adaptive staircase method. The number and duration of eye fixations, saccade amplitude, and perceptual span, i.e., the number of items that can be processed during a single fixation, were measured. It was found that search performance was correlated with the number of fixations needed to find the target. Search time and the number of fixations increased with increasing stimulus set size. On the other hand, several complex objects could be processed during a single fixation, i.e., within the perceptual span. Search time and the number of fixations depended on object type as well as luminance contrast. The size of the perceptual span was smaller for more complex objects, and decreased with decreasing luminance contrast within object type, especially for very low contrasts. In addition, the size and shape of perceptual span explained the changes in search performance for different stimulus layouts in word search. Perceptual span was scale invariant for a 16-fold range of stimulus sizes, i.e., the number of items processed during a single fixation was independent of retinal stimulus size or viewing distance. It is suggested that saccadic visual search consists of both serial (eye movements) and parallel (processing within perceptual span) components, and that the size of the perceptual span may explain the effectiveness of saccadic search in different stimulus conditions. Further, low-level visual factors, such as the anatomical structure of the retina, peripheral stimulus visibility and resolution requirements for the identification of different object types are proposed to constrain the size of the perceptual span, and thus, limit visual search performance. Similar methods were used in a clinical study to characterise the visual search performance and eye movements of neurological patients with chronic solvent-induced encephalopathy (CSE). In addition, the data about the effects of different stimulus properties on visual search in normal subjects were presented as simple practical guidelines, so that the limits of human visual perception could be taken into account in the design of user interfaces.

Dynamics of contour, object and face processing in the human visual cortex

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The neural basis of visual perception can be understood only when the sequence of cortical activity underlying successful recognition is known. The early steps in this processing chain, from retina to the primary visual cortex, are highly local, and the perception of more complex shapes requires integration of the local information. In Study I of this thesis, the progression from local to global visual analysis was assessed by recording cortical magnetoencephalographic (MEG) responses to arrays of elements that either did or did not form global contours. The results demonstrated two spatially and temporally distinct stages of processing: The first, emerging 70 ms after stimulus onset around the calcarine sulcus, was sensitive to local features only, whereas the second, starting at 130 ms across the occipital and posterior parietal cortices, reflected the global configuration. To explore the links between cortical activity and visual recognition, Studies II III presented subjects with recognition tasks of varying levels of difficulty. The occipito-temporal responses from 150 ms onwards were closely linked to recognition performance, in contrast to the 100-ms mid-occipital responses. The averaged responses increased gradually as a function of recognition performance, and further analysis (Study III) showed the single response strengths to be graded as well. Study IV addressed the attention dependence of the different processing stages: Occipito-temporal responses peaking around 150 ms depended on the content of the visual field (faces vs. houses), whereas the later and more sustained activity was strongly modulated by the observers attention. Hemodynamic responses paralleled the pattern of the more sustained electrophysiological responses. Study V assessed the temporal processing capacity of the human object recognition system. Above sufficient luminance, contrast and size of the object, the processing speed was not limited by such low-level factors. Taken together, these studies demonstrate several distinct stages in the cortical activation sequence underlying the object recognition chain, reflecting the level of feature integration, difficulty of recognition, and direction of attention.

Developmental, functional brain imaging and electrophysiological evidence of visual and auditory working memory

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intact function of working memory (WM) is essential for children and adults to cope with every day life. Children with deficits in WM mechanisms have learning difficulties that are often accompanied by behavioral problems. The neural processes subserving WM, and brain structures underlying this system, continue to develop during childhood till adolescence and young adulthood. With functional magnetic resonance imaging (fMRI) it is possible to investigate the organization and development of WM. The present thesis aimed to investigate, using behavioral and neuroimaging methods, whether mnemonic processing of spatial and nonspatial visual information is segregated in the developing and mature human brain. A further aim in this research was to investigate the organization and development of audiospatial and visuospatial information processing in WM. The behavioral results showed that spatial and nonspatial visual WM processing is segregated in the adult brain. The fMRI result in children suggested that memory load related processing of spatial and nonspatial visual information engages common cortical networks, whereas selective attention to either type of stimuli recruits partially segregated areas in the frontal, parietal and occipital cortices. Deactivation mechanisms that are important in the performance of WM tasks in adults are already operational in healthy school-aged children. Electrophysiological evidence suggested segregated mnemonic processing of visual and auditory location information. The results of the development of audiospatial and visuospatial WM demonstrate that WM performance improves with age, suggesting functional maturation of underlying cognitive processes and brain areas. The development of the performance of spatial WM tasks follows a different time course in boys and girls indicating a larger degree of immaturity in the male than female WM systems. Furthermore, the differences in mastering auditory and visual WM tasks may indicate that visual WM reaches functional maturity earlier than the corresponding auditory system. Spatial WM deficits may underlie some learning difficulties and behavioral problems related to impulsivity, difficulties in concentration, and hyperactivity. Alternatively, anxiety or depressive symptoms may affect WM function and the ability to concentrate, being thus the primary cause of poor academic achievement in children.

MPC controlled multirotor with suspended slung load: System architecture and visual load detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is an increased interest in the use of Unmanned Aerial Vehicles for load transportation from environmental remote sensing to construction and parcel delivery. One of the main challenges is accurate control of the load position and trajectory. This paper presents an assessment of real flight trials for the control of an autonomous multi-rotor with a suspended slung load using only visual feedback to determine the load position. This method uses an onboard camera to take advantage of a common visual marker detection algorithm to robustly detect the load location. The load position is calculated using an onboard processor, and transmitted over a wireless network to a ground station integrating MATLAB/SIMULINK and Robotic Operating System (ROS) and a Model Predictive Controller (MPC) to control both the load and the UAV. To evaluate the system performance, the position of the load determined by the visual detection system in real flight is compared with data received by a motion tracking system. The multi-rotor position tracking performance is also analyzed by conducting flight trials using perfect load position data and data obtained only from the visual system. Results show very accurate estimation of the load position (~5% Offset) using only the visual system and demonstrate that the need for an external motion tracking system is not needed for this task.

Spectral Tuning and Adaptation to Different Light Environments of Mysid Visual Pigments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the present thesis, questions of spectral tuning, the relation of spectral and thermal properties of visual pigments, and evolutionary adaptation to different light environments were addressed using a group of small crustaceans of the genus Mysis as a model. The study was based on microspectrophotometric measurements of visual pigment absorbance spectra, electrophysiological measurements of spectral sensitivities of dark-adapted eyes, and sequencing of the opsin gene retrieved through PCR. The spectral properties were related to the spectral transmission of the respective light environments, as well as to the phylogentic histories of the species. The photoactivation energy (Ea) was estimated from temperature effects on spectral sensitivity in the long-wavelength range, and calculations were made for optimal quantum catch and optimal signal-to-noise ratio in the different light environments. The opsin amino acid sequences of spectrally characterized individuals were compared to find candidate residues for spectral tuning. The general purpose was to clarify to what extent and on what time scale adaptive evolution has driven the functional properties of (mysid) visual pigments towards optimal performance in different light environments. An ultimate goal was to find the molecular mechanisms underlying the spectral tuning and to understand the balance between evolutionary adaptation and molecular constraints. The totally consistent segregation of absorption maxima (λmax) into (shorter-wavelength) marine and (longer-wavelength) freshwater populations suggests that truly adaptive evolution is involved in tuning the visual pigment for optimal performance, driven by selection for high absolute visual sensitivity. On the other hand, the similarity in λmax and opsin sequence between several populations of freshwater M. relicta in spectrally different lakes highlights the limits to adaptation set by evolutionary history and time. A strong inverse correlation between Ea and λmax was found among all visual pigments studied in these respects, including those of M. relicta and 10 species of vertebrate pigments, and this was used to infer thermal noise. The conceptual signal-to-noise ratios thus calculated for pigments with different λmax in the Baltic Sea and Lake Pääjärvi light environments supported the notion that spectral adaptation works towards maximizing the signal-to-noise ratio rather than quantum catch as such. Judged by the shape of absorbance spectra, the visual pigments of all populations of M. relicta and M. salemaai used exclusively the A2 chromophore (3, 4-dehydroretinal). A comparison of amino acid substitutions between M. relicta and M. salemaai indicated that mysid shrimps have a small number of readily available tuning sites to shift between a shorter - and a longer -wavelength opsin. However, phylogenetic history seems to have prevented marine M. relicta from converting back to the (presumably) ancestral opsin form, and thus the more recent reinvention of marine spectral sensitivity has been accomplished by some other novel mechanism, yet to be found

Similarity relations in visual search predict rapid visual categorization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How do we perform rapid visual categorization?It is widely thought that categorization involves evaluating the similarity of an object to other category items, but the underlying features and similarity relations remain unknown. Here, we hypothesized that categorization performance is based on perceived similarity relations between items within and outside the category. To this end, we measured the categorization performance of human subjects on three diverse visual categories (animals, vehicles, and tools) and across three hierarchical levels (superordinate, basic, and subordinate levels among animals). For the same subjects, we measured their perceived pair-wise similarities between objects using a visual search task. Regardless of category and hierarchical level, we found that the time taken to categorize an object could be predicted using its similarity to members within and outside its category. We were able to account for several classic categorization phenomena, such as (a) the longer times required to reject category membership; (b) the longer times to categorize atypical objects; and (c) differences in performance across tasks and across hierarchical levels. These categorization times were also accounted for by a model that extracts coarse structure from an image. The striking agreement observed between categorization and visual search suggests that these two disparate tasks depend on a shared coarse object representation.

«
1
2
...
14
15
16
17
18
19
20
...
60
61
»