940 resultados para mid-level vision
Resumo:
This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.
We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.
In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.
In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.
The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.
Resumo:
We address mid-level vision for the recognition of non-rigid objects. We align model and image using frame curves - which are object or "figure/ground" skeletons. Frame curves are computed, without discontinuities, using Curved Inertia Frames, a provably global scheme implemented on the Connection Machine, based on: non-cartisean networks; a definition of curved axis of inertia; and a ridge detector. I present evidence against frame alignment in human perception. This suggests: frame curves have a role in figure/ground segregation and in fuzzy boundaries; their outside/near/top/ incoming regions are more salient; and that perception begins by setting a reference frame (prior to early vision), and proceeds by processing convex structures.
Resumo:
To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.
Resumo:
This paper presents an SIMD machine which has been tuned to execute low-level vision algorithms employing the relaxation labeling paradigm. Novel features of the design include: 1. (1) a communication scheme capable of window accessing under a single instruction. 2. (2) flexible I/O instructions to load overlapped data segments; and 3. (3) data-conditional instructions which can be nested to an arbitrary degree. A time analysis of the stereo correspondence problem, as implemented on a simulated version of the machine using the probabilistic relaxation technique, shows a speed up of almost N2 for an N × N array of PEs.
Resumo:
Early and intermediate vision algorithms, such as smoothing and discontinuity detection, are often implemented on general-purpose serial, and more recently, parallel computers. Special-purpose hardware implementations of low-level vision algorithms may be needed to achieve real-time processing. This memo reviews and analyzes some hardware implementations of low-level vision algorithms. Two types of hardware implementations are considered: the digital signal processing chips of Ruetz (and Broderson) and the analog VLSI circuits of Carver Mead. The advantages and disadvantages of these two approaches for producing a general, real-time vision system are considered.
Resumo:
Simultaneous observations of cloud microphysical properties were obtained by in-situ aircraft measurements and ground based Radar/Lidar. Widespread mid-level stratus cloud was present below a temperature inversion (~5 °C magnitude) at 3.6 km altitude. Localised convection (peak updraft 1.5 m s−1) was observed 20 km west of the Radar station. This was associated with convergence at 2.5 km altitude. The convection was unable to penetrate the inversion capping the mid-level stratus.
The mid-level stratus cloud was vertically thin (~400 m), horizontally extensive (covering 100 s of km) and persisted for more than 24 h. The cloud consisted of supercooled water droplets and small concentrations of large (~1 mm) stellar/plate like ice which slowly precipitated out. This ice was nucleated at temperatures greater than −12.2 °C and less than −10.0 °C, (cloud top and cloud base temperatures, respectively). No ice seeding from above the cloud layer was observed. This ice was formed by primary nucleation, either through the entrainment of efficient ice nuclei from above/below cloud, or by the slow stochastic activation of immersion freezing ice nuclei contained within the supercooled drops. Above cloud top significant concentrations of sub-micron aerosol were observed and consisted of a mixture of sulphate and carbonaceous material, a potential source of ice nuclei. Particle number concentrations (in the size range 0.1
Resumo:
The advance of the onset of the Indian monsoon is here explained in terms of a balance between the low-level monsoon flow and an over-running intrusion of mid-tropospheric dry air. The monsoon advances, over a period of about 6 weeks, from the south of the country to the northwest. Given that the low-level monsoon winds are westerly or southwesterly, and the midlevel winds northwesterly, the monsoon onset propagates upwind relative to midlevel flow, and perpendicular to the low-level flow, and is not directly caused by moisture flux toward the northwest. Lacking a conceptual model for the advance means that it has been hard to understand and correct known biases in weather and climate prediction models. The mid-level northwesterlies form a wedge of dry air that is deep in the far northwest of India and over-runs the monsoon flow. The dry layer is moistened from below by shallow cumulus and congestus clouds, so that the profile becomes much closer to moist adiabatic, and the dry layer is much shallower in the vertical, toward the southeast of India. The profiles associated with this dry air show how the most favourable environment for deep convection occurs in the south, and onset occurs here first. As the onset advances across India, the advection of moisture from the Arabian Sea becomes stronger, and the mid-level dry air is increasingly moistened from below. This increased moistening makes the wedge of dry air shallower throughout its horizontal extent, and forces the northern limit of moist convection to move toward the northwest. Wetting of the land surface by rainfall will further reinforce the north-westward progression, by sustaining the supply of boundary layer moisture and shallow cumulus. The local advance of the monsoon onset is coincident with weakening of the mid-level northwesterlies, and therefore weakened mid-level dry advection.
Resumo:
With the introduction of the mid-level ethanol blend gasoline fuel for commercial sale, the compatibility of different off-road engines is needed. This report details the test study of using one mid-level ethanol fuel in a two stroke hand held gasoline engine used to power line trimmers. The study sponsored by E3 is to test the effectiveness of an aftermarket spark plug from E3 Spark Plug when using a mid-level ethanol blend gasoline. A 15% ethanol by volume (E15) is the test mid-level ethanol used and the 10% ethanol by volume (E10) was used as the baseline fuel. The testing comprises running the engine at different load points and throttle positions to evaluate the cylinder head temperature, exhaust temperature and engine speed. Raw gas emissions were also measured to determine the impact of the performance spark plug. The low calorific value of the E15 fuel decreased the speed of the engine along with reduction in the fuel consumption and exhaust gas temperature. The HC emissions for E15 fuel and E3 spark plug increased when compared to the base line in most of the cases and NO formation was dependent on the cylinder head temperature. The E3 spark plug had a tendency to increase the temperature of the cylinder head irrespective of fuel type while reducing engine speed.
Resumo:
THE PURPOSE OF THIS ARTICLE is two-fold, first to provide a general overview of two of the main cognitive neuroscientific techniques available, specifically functional magnetic resonance imaging (fMRI) and transcranial magnetic stimulation (TMS); and secondly to apply these techniques to elaborate a discussion of an aspect of higher level vision, namely implied motion, that is the perception of movement from a static image.
Resumo:
We summarize the various strands of research on peripheral vision and relate them to theories of form perception. After a historical overview, we describe quantifications of the cortical magnification hypothesis, including an extension of Schwartz's cortical mapping function. The merits of this concept are considered across a wide range of psychophysical tasks, followed by a discussion of its limitations and the need for non-spatial scaling. We also review the eccentricity dependence of other low-level functions including reaction time, temporal resolution, and spatial summation, as well as perimetric methods. A central topic is then the recognition of characters in peripheral vision, both at low and high levels of contrast, and the impact of surrounding contours known as crowding. We demonstrate how Bouma's law, specifying the critical distance for the onset of crowding, can be stated in terms of the retinocortical mapping. The recognition of more complex stimuli, like textures, faces, and scenes, reveals a substantial impact of mid-level vision and cognitive factors. We further consider eccentricity-dependent limitations of learning, both at the level of perceptual learning and pattern category learning. Generic limitations of extrafoveal vision are observed for the latter in categorization tasks involving multiple stimulus classes. Finally, models of peripheral form vision are discussed. We report that peripheral vision is limited with regard to pattern categorization by a distinctly lower representational complexity and processing speed. Taken together, the limitations of cognitive processing in peripheral vision appear to be as significant as those imposed on low-level functions and by way of crowding.
Resumo:
We summarize the various strands of research on peripheral vision and relate them to theories of form perception. After a historical overview, we describe quantifications of the cortical magnification hypothesis, including an extension of Schwartz's cortical mapping function. The merits of this concept are considered across a wide range of psychophysical tasks, followed by a discussion of its limitations and the need for non-spatial scaling. We also review the eccentricity dependence of other low-level functions including reaction time, temporal resolution, and spatial summation, as well as perimetric methods. A central topic is then the recognition of characters in peripheral vision, both at low and high levels of contrast, and the impact of surrounding contours known as crowding. We demonstrate how Bouma's law, specifying the critical distance for the onset of crowding, can be stated in terms of the retinocortical mapping. The recognition of more complex stimuli, like textures, faces, and scenes, reveals a substantial impact of mid-level vision and cognitive factors. We further consider eccentricity-dependent limitations of learning, both at the level of perceptual learning and pattern category learning. Generic limitations of extrafoveal vision are observed for the latter in categorization tasks involving multiple stimulus classes. Finally, models of peripheral form vision are discussed. We report that peripheral vision is limited with regard to pattern categorization by a distinctly lower representational complexity and processing speed. Taken together, the limitations of cognitive processing in peripheral vision appear to be as significant as those imposed on low-level functions and by way of crowding.
Resumo:
Copyright © 2016 the authors 0270-6474/16/360714-16$15.00/0. This research was supported by National Science Foundation INSPIRE Grant 1248076, which was awarded to Y.L. and A.M.N.
Resumo:
The earliest stages of human cortical visual processing can be conceived as extraction of local stimulus features. However, more complex visual functions, such as object recognition, require integration of multiple features. Recently, neural processes underlying feature integration in the visual system have been under intensive study. A specialized mid-level stage preceding the object recognition stage has been proposed to account for the processing of contours, surfaces and shapes as well as configuration. This thesis consists of four experimental, psychophysical studies on human visual feature integration. In two studies, classification image a recently developed psychophysical reverse correlation method was used. In this method visual noise is added to near-threshold stimuli. By investigating the relationship between random features in the noise and observer s perceptual decision in each trial, it is possible to estimate what features of the stimuli are critical for the task. The method allows visualizing the critical features that are used in a psychophysical task directly as a spatial correlation map, yielding an effective "behavioral receptive field". Visual context is known to modulate the perception of stimulus features. Some of these interactions are quite complex, and it is not known whether they reflect early or late stages of perceptual processing. The first study investigated the mechanisms of collinear facilitation, where nearby collinear Gabor flankers increase the detectability of a central Gabor. The behavioral receptive field of the mechanism mediating the detection of the central Gabor stimulus was measured by the classification image method. The results show that collinear flankers increase the extent of the behavioral receptive field for the central Gabor, in the direction of the flankers. The increased sensitivity at the ends of the receptive field suggests a low-level explanation for the facilitation. The second study investigated how visual features are integrated into percepts of surface brightness. A novel variant of the classification image method with brightness matching task was used. Many theories assume that perceived brightness is based on the analysis of luminance border features. Here, for the first time this assumption was directly tested. The classification images show that the perceived brightness of both an illusory Craik-O Brien-Cornsweet stimulus and a real uniform step stimulus depends solely on the border. Moreover, the spatial tuning of the features remains almost constant when the stimulus size is changed, suggesting that brightness perception is based on the output of a single spatial frequency channel. The third and fourth studies investigated global form integration in random-dot Glass patterns. In these patterns, a global form can be immediately perceived, if even a small proportion of random dots are paired to dipoles according to a geometrical rule. In the third study the discrimination of orientation structure in highly coherent concentric and Cartesian (straight) Glass patterns was measured. The results showed that the global form was more efficiently discriminated in concentric patterns. The fourth study investigated how form detectability depends on the global regularity of the Glass pattern. The local structure was either Cartesian or curved. It was shown that randomizing the local orientation deteriorated the performance only with the curved pattern. The results give support for the idea that curved and Cartesian patterns are processed in at least partially separate neural systems.