841 resultados para visual object detection
Resumo:
We have developed a Hierarchical Look-Ahead Trajectory Model (HiLAM) that incorporates the firing pattern of medial entorhinal grid cells in a planning circuit that includes interactions with hippocampus and prefrontal cortex. We show the model’s flexibility in representing large real world environments using odometry information obtained from challenging video sequences. We acquire the visual data from a camera mounted on a small tele-operated vehicle. The camera has a panoramic field of view with its focal point approximately 5 cm above the ground level, similar to what would be expected from a rat’s point of view. Using established algorithms for calculating perceptual speed from the apparent rate of visual change over time, we generate raw dead reckoning information which loses spatial fidelity over time due to error accumulation. We rectify the loss of fidelity by exploiting the loop-closure detection ability of a biologically inspired, robot navigation model termed RatSLAM. The rectified motion information serves as a velocity input to the HiLAM to encode the environment in the form of grid cell and place cell maps. Finally, we show goal directed path planning results of HiLAM in two different environments, an indoor square maze used in rodent experiments and an outdoor arena more than two orders of magnitude larger than the indoor maze. Together these results bridge for the first time the gap between higher fidelity bio-inspired navigation models (HiLAM) and more abstracted but highly functional bio-inspired robotic mapping systems (RatSLAM), and move from simulated environments into real-world studies in rodent-sized arenas and beyond.
Resumo:
The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters. Traditional clustering methods fail to scale when they need to generate a large number of clusters. Furthermore, when the clusters size in the solution is heterogeneous, i.e. some of the clusters are large in size, the similarity measures tend to degrade. A ranking based clustering method is proposed to deal with these issues in the context of the Social Event Detection task. Ranking scores are used to select a small number of most relevant clusters in order to compare and place a document. Additionally,instead of conventional cluster centroids, cluster patches are proposed to represent clusters, that are hubs-like set of documents. Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity. Results show that these strategies allow us to have a balance between performance and accuracy of the clustering solution gained by the clustering method.
Resumo:
This paper deals with constrained image-based visual servoing of circular and conical spiral motion about an unknown object approximating a single image point feature. Effective visual control of such trajectories has many applications for small unmanned aerial vehicles, including surveillance and inspection, forced landing (homing), and collision avoidance. A spherical camera model is used to derive a novel visual-predictive controller (VPC) using stability-based design methods for general nonlinear model-predictive control. In particular, a quasi-infinite horizon visual-predictive control scheme is derived. A terminal region, which is used as a constraint in the controller structure, can be used to guide appropriate reference image features for spiral tracking with respect to nominal stability and feasibility. Robustness properties are also discussed with respect to parameter uncertainty and additive noise. A comparison with competing visual-predictive control schemes is made, and some experimental results using a small quad rotor platform are given.
Resumo:
This paper outlines the approach taken by the Speech, Audio, Image and Video Technologies laboratory, and the Applied Data Mining Research Group (SAIVT-ADMRG) in the 2014 MediaEval Social Event Detection (SED) task. We participated in the event based clustering subtask (subtask 1), and focused on investigating the incorporation of image features as another source of data to aid clustering. In particular, we developed a descriptor based around the use of super-pixel segmentation, that allows a low dimensional feature that incorporates both colour and texture information to be extracted and used within the popular bag-of-visual-words (BoVW) approach.
Green-fluorescent protein facilitates rapid in vivo detection of genetically transformed plant cells
Resumo:
Early detection of plant transformation events is necessary for the rapid establishment and optimization of plant transformation protocols. We have assessed modified versions of the green fluorescent protein (GFP) from Aequorea victoria as early reporters of plant transformation using a dissecting fluorescence microscope with appropriate filters. Gfp-expressing cells from four different plant species (sugarcane, maize, lettuce, and tobacco) were readily distinguished, following either Agrobacterium-mediated or particle bombardment-mediated transformation. The identification of gfp-expressing sugarcane cells allowed for the elimination of a high proportion of non-expressing explants and also enabled visual selection of dividing transgenic cells, an early step in the generation of transgenic organisms. The recovery of transgenic cell clusters was streamlined by the ability to visualize gfp-expressing tissues in vitro.
Resumo:
Our aim was to make a quantitative comparison of the response of the different visual cortical areas to selective stimulation of the two different cone-opponent pathways [long- and medium-wavelength (L/M)- and short-wavelength (S)-cone-opponent] and the achromatic pathway under equivalent conditions. The appropriate stimulus-contrast metric for the comparison of colour and achromatic sensitivity is unknown, however, and so a secondary aim was to investigate whether equivalent fMRI responses of each cortical area are predicted by stimulus contrast matched in multiples of detection threshold that approximately equates for visibility, or direct (cone) contrast matches in which psychophysical sensitivity is uncorrected. We found that the fMRI response across the two colour and achromatic pathways is not well predicted by threshold-scaled stimuli (perceptual visibility) but is better predicted by cone contrast, particularly for area V1. Our results show that the early visual areas (V1, V2, V3, VP and hV4) all have robust responses to colour. No area showed an overall colour preference, however, until anterior to V4 where we found a ventral occipital region that has a significant preference for chromatic stimuli, indicating a functional distinction from earlier areas. We found that all of these areas have a surprisingly strong response to S-cone stimuli, at least as great as the L/M response, suggesting a relative enhancement of the S-cone cortical signal. We also identified two areas (V3A and hMT+) with a significant preference for achromatic over chromatic stimuli, indicating a functional grouping into a dorsal pathway with a strong magnocellular input.
Resumo:
Informed by Kristeva's formulation of affect and Winnicott's Holding Environment, this practice-led visual art project is an exploration into how sensitivity to the physical sensation of trembling can sustain a creative practice. Building upon this is a further enquiry into what the significance of the affective experience of trembling is for an ethics of affect in contemporary art. I have done this through object and video-based installations informed by my own experience of trembling. This has been further informed by the work of artists like Louise Bourgeois, Dennis Del Favero and Willie Doherty. The creative outcomes contribute to the discourse around ethical responses to affect by extending and developing on the works of these artists.
Resumo:
Even though crashes between trains and road users are rare events at railway level crossings, they are one of the major safety concerns for the Australian railway industry. Nearmiss events at level crossings occur more frequently, and can provide more information about factors leading to level crossing incidents. In this paper we introduce a video analytic approach for automatically detecting and localizing vehicles from cameras mounted on trains for detecting near-miss events. To detect and localize vehicles at level crossings we extract patches from an image and classify each patch for detecting vehicles. We developed a region proposals algorithm for generating patches, and we use a Convolutional Neural Network (CNN) for classifying each patch. To localize vehicles in images we combine the patches that are classified as vehicles according to their CNN scores and positions. We compared our system with the Deformable Part Models (DPM) and Regions with CNN features (R-CNN) object detectors. Experimental results on a railway dataset show that the recall rate of our proposed system is 29% higher than what can be achieved with DPM or R-CNN detectors.
Resumo:
This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.
Resumo:
Purpose Little is known about the prevalence of refractive error, binocular vision, and other visual conditions in Australian Indigenous children. This is important given the association of these visual conditions with reduced reading performance in the wider population, which may also contribute to the suboptimal reading performance reported in this population. The aim of this study was to develop a visual profile of Queensland Indigenous children. Methods Vision testing was performed on 595 primary schoolchildren in Queensland, Australia. Vision parameters measured included visual acuity, refractive error, color vision, nearpoint of convergence, horizontal heterophoria, fusional vergence range, accommodative facility, AC/A ratio, visual motor integration, and rapid automatized naming. Near heterophoria, nearpoint of convergence, and near fusional vergence range were used to classify convergence insufficiency (CI). Results Although refractive error (Indigenous, 10%; non-Indigenous, 16%; p = 0.04) and strabismus (Indigenous, 0%; non-Indigenous, 3%; p = 0.03) were significantly less common in Indigenous children, CI was twice as prevalent (Indigenous, 10%; non-Indigenous, 5%; p = 0.04). Reduced visual information processing skills were more common in Indigenous children (reduced visual motor integration [Indigenous, 28%; non-Indigenous, 16%; p < 0.01] and slower rapid automatized naming [Indigenous, 67%; non-Indigenous, 59%; p = 0.04]). The prevalence of visual impairment (reduced visual acuity) and color vision deficiency was similar between groups. Conclusions Indigenous children have less refractive error and strabismus than their non-Indigenous peers. However, CI and reduced visual information processing skills were more common in this group. Given that vision screenings primarily target visual acuity assessment and strabismus detection, this is an important finding as many Indigenous children with CI and reduced visual information processing may be missed. Emphasis should be placed on identifying children with CI and reduced visual information processing given the potential effect of these conditions on school performance
Resumo:
The visual systems of humans and animals represent physical reality in a modified way, depending on the specific demands that the species in question has for survival. The ability to perceive visual illusions is found in independently evolved visual systems, from honeybees to humans. In humans, the ability emerges early, at the age of four months. Thus the perception of illusion is likely to reflect visual processes of fundamental importance for object perception in natural vision. The experiments reported in this thesis employed various modifications of the Kanizsa triangle, a drawn configuration composed of three black disks with missing sectors on a white background. The sectors appear to form the tips of a triangle. The visual system completes the physically empty area between the disks, generally called inducers, with giving the perception of an illusory triangle. The illusory triangle consists of an illusory surface bounded by illusory contours; the triangle appears brighter than and to lie above the background. If the sectors are coloured, the colour fills the illusory area, a phenomenon known as neon colour spreading . We investigated spatial limitations on the perception of Kanizsa-type illusions and how other stimuli and viewing parameters affected these limitations. We also studied complex configurations thick, bent, mobile and chromatic inducers - to determine whether illusions combining several attributes can be perceived. The results suggest that the visual system is highly effective in completing a percept. The perception of an illusory figure is spatially scale invariant when perceived at threshold. The processing time and the number of fixations modify the percept, making the perception of the illusion more probable in various viewing conditions. Furthermore, the fact that the illusion can be perceived when only one inducer is physically present at any given moment indicates the potential of single inducers. Apparently, modelling illusory figure perception will require a combination of low-level, local processes and higher-level integrative processes. Our studies with stimuli combining several attributes relevant to object perception demonstrate that the perception of an illusory figure is flexible and is maintained also when it contains colour and volume and when shown in movement. All in all, the results confirm the assumed importance of the visual processes related with the perception of illusory figures in everyday viewing. This is indicated by the variety of inducer modifications that can be made without destroying the percept. Furthermore, the illusion can acquire additional attributes from such modifications. Due to individual differences in the perception of illusory figures, universal values for absolute performance are not always meaningful, but stable trends and general relations do exist.
Resumo:
In visual search one tries to find the currently relevant item among other, irrelevant items. In the present study, visual search performance for complex objects (characters, faces, computer icons and words) was investigated, and the contribution of different stimulus properties, such as luminance contrast between characters and background, set size, stimulus size, colour contrast, spatial frequency, and stimulus layout were investigated. Subjects were required to search for a target object among distracter objects in two-dimensional stimulus arrays. The outcome measure was threshold search time, that is, the presentation duration of the stimulus array required by the subject to find the target with a certain probability. It reflects the time used for visual processing separated from the time used for decision making and manual reactions. The duration of stimulus presentation was controlled by an adaptive staircase method. The number and duration of eye fixations, saccade amplitude, and perceptual span, i.e., the number of items that can be processed during a single fixation, were measured. It was found that search performance was correlated with the number of fixations needed to find the target. Search time and the number of fixations increased with increasing stimulus set size. On the other hand, several complex objects could be processed during a single fixation, i.e., within the perceptual span. Search time and the number of fixations depended on object type as well as luminance contrast. The size of the perceptual span was smaller for more complex objects, and decreased with decreasing luminance contrast within object type, especially for very low contrasts. In addition, the size and shape of perceptual span explained the changes in search performance for different stimulus layouts in word search. Perceptual span was scale invariant for a 16-fold range of stimulus sizes, i.e., the number of items processed during a single fixation was independent of retinal stimulus size or viewing distance. It is suggested that saccadic visual search consists of both serial (eye movements) and parallel (processing within perceptual span) components, and that the size of the perceptual span may explain the effectiveness of saccadic search in different stimulus conditions. Further, low-level visual factors, such as the anatomical structure of the retina, peripheral stimulus visibility and resolution requirements for the identification of different object types are proposed to constrain the size of the perceptual span, and thus, limit visual search performance. Similar methods were used in a clinical study to characterise the visual search performance and eye movements of neurological patients with chronic solvent-induced encephalopathy (CSE). In addition, the data about the effects of different stimulus properties on visual search in normal subjects were presented as simple practical guidelines, so that the limits of human visual perception could be taken into account in the design of user interfaces.
Resumo:
The synchronization of neuronal activity, especially in the beta- (14-30 Hz) /gamma- (30 80 Hz) frequency bands, is thought to provide a means for the integration of anatomically distributed processing and for the formation of transient neuronal assemblies. Thus non-stimulus locked (i.e. induced) gamma-band oscillations are believed to underlie feature binding and the formation of neuronal object representations. On the other hand, the functional roles of neuronal oscillations in slower theta- (4 8 Hz) and alpha- (8 14 Hz) frequency bands remain controversial. In addition, early stimulus-locked activity has been largely ignored, as it is believed to reflect merely the physical properties of sensory stimuli. With human neuromagnetic recordings, both the functional roles of gamma- and alpha-band oscillations and the significance of early stimulus-locked activity in neuronal processing were examined in this thesis. Study I of this thesis shows that even the stimulus-locked (evoked) gamma oscillations were sensitive to high-level stimulus features for speech and non-speech sounds, suggesting that they may underlie the formation of early neuronal object representations for stimuli with a behavioural relevance. Study II shows that neuronal processing for consciously perceived and unperceived stimuli differed as early as 30 ms after stimulus onset. This study also showed that the alpha band oscillations selectively correlated with conscious perception. Study III, in turn, shows that prestimulus alpha-band oscillations influence the subsequent detection and processing of sensory stimuli. Further, in Study IV, we asked whether phase synchronization between distinct frequency bands is present in cortical circuits. This study revealed prominent task-sensitive phase synchrony between alpha and beta/gamma oscillations. Finally, the implications of Studies II, III, and IV to the broader scientific context are analysed in the last study of this thesis (V). I suggest, in this thesis that neuronal processing may be extremely fast and that the evoked response is important for cognitive processes. I also propose that alpha oscillations define the global neuronal workspace of perception, action, and consciousness and, further, that cross-frequency synchronization is required for the integration of neuronal object representations into global neuronal workspace.
Resumo:
The earliest stages of human cortical visual processing can be conceived as extraction of local stimulus features. However, more complex visual functions, such as object recognition, require integration of multiple features. Recently, neural processes underlying feature integration in the visual system have been under intensive study. A specialized mid-level stage preceding the object recognition stage has been proposed to account for the processing of contours, surfaces and shapes as well as configuration. This thesis consists of four experimental, psychophysical studies on human visual feature integration. In two studies, classification image a recently developed psychophysical reverse correlation method was used. In this method visual noise is added to near-threshold stimuli. By investigating the relationship between random features in the noise and observer s perceptual decision in each trial, it is possible to estimate what features of the stimuli are critical for the task. The method allows visualizing the critical features that are used in a psychophysical task directly as a spatial correlation map, yielding an effective "behavioral receptive field". Visual context is known to modulate the perception of stimulus features. Some of these interactions are quite complex, and it is not known whether they reflect early or late stages of perceptual processing. The first study investigated the mechanisms of collinear facilitation, where nearby collinear Gabor flankers increase the detectability of a central Gabor. The behavioral receptive field of the mechanism mediating the detection of the central Gabor stimulus was measured by the classification image method. The results show that collinear flankers increase the extent of the behavioral receptive field for the central Gabor, in the direction of the flankers. The increased sensitivity at the ends of the receptive field suggests a low-level explanation for the facilitation. The second study investigated how visual features are integrated into percepts of surface brightness. A novel variant of the classification image method with brightness matching task was used. Many theories assume that perceived brightness is based on the analysis of luminance border features. Here, for the first time this assumption was directly tested. The classification images show that the perceived brightness of both an illusory Craik-O Brien-Cornsweet stimulus and a real uniform step stimulus depends solely on the border. Moreover, the spatial tuning of the features remains almost constant when the stimulus size is changed, suggesting that brightness perception is based on the output of a single spatial frequency channel. The third and fourth studies investigated global form integration in random-dot Glass patterns. In these patterns, a global form can be immediately perceived, if even a small proportion of random dots are paired to dipoles according to a geometrical rule. In the third study the discrimination of orientation structure in highly coherent concentric and Cartesian (straight) Glass patterns was measured. The results showed that the global form was more efficiently discriminated in concentric patterns. The fourth study investigated how form detectability depends on the global regularity of the Glass pattern. The local structure was either Cartesian or curved. It was shown that randomizing the local orientation deteriorated the performance only with the curved pattern. The results give support for the idea that curved and Cartesian patterns are processed in at least partially separate neural systems.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.