20 resultados para Boundary objects

em Boston University Digital Common


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recognition of 3-D objects from sequences of their 2-D views is modeled by a family of self-organizing neural architectures, called VIEWNET, that use View Information Encoded With NETworks. VIEWNET incorporates a preprocessor that generates a compressed but 2-D invariant representation of an image, a supervised incremental learning system that classifies the preprocessed representations into 2-D view categories whose outputs arc combined into 3-D invariant object categories, and a working memory that makes a 3-D object prediction by accumulating evidence from 3-D object category nodes as multiple 2-D views are experienced. The simplest VIEWNET achieves high recognition scores without the need to explicitly code the temporal order of 2-D views in working memory. Working memories are also discussed that save memory resources by implicitly coding temporal order in terms of the relative activity of 2-D view category nodes, rather than as explicit 2-D view transitions. Variants of the VIEWNET architecture may also be used for scene understanding by using a preprocessor and classifier that can determine both What objects are in a scene and Where they are located. The present VIEWNET preprocessor includes the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and suppresses image noise. This boundary segmentation is rendered invariant under 2-D translation, rotation, and dilation by use of a log-polar transform. The invariant spectra undergo Gaussian coarse coding to further reduce noise and 3-D foreshortening effects, and to increase generalization. These compressed codes are input into the classifier, a supervised learning system based on the fuzzy ARTMAP algorithm. Fuzzy ARTMAP learns 2-D view categories that are invariant under 2-D image translation, rotation, and dilation as well as 3-D image transformations that do not cause a predictive error. Evidence from sequence of 2-D view categories converges at 3-D object nodes that generate a response invariant under changes of 2-D view. These 3-D object nodes input to a working memory that accumulates evidence over time to improve object recognition. ln the simplest working memory, each occurrence (nonoccurrence) of a 2-D view category increases (decreases) the corresponding node's activity in working memory. The maximally active node is used to predict the 3-D object. Recognition is studied with noisy and clean image using slow and fast learning. Slow learning at the fuzzy ARTMAP map field is adapted to learn the conditional probability of the 3-D object given the selected 2-D view category. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of l28x128 2-D views of aircraft with and without additive noise. A recognition rate of up to 90% is achieved with one 2-D view and of up to 98.5% correct with three 2-D views. The properties of 2-D view and 3-D object category nodes are compared with those of cells in monkey inferotemporal cortex.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sound propagation in shallow water is characterized by interaction with the oceans surface, volume, and bottom. In many coastal margin regions, including the Eastern U.S. continental shelf and the coastal seas of China, the bottom is composed of a depositional sandy-silty top layer. Previous measurements of narrow and broadband sound transmission at frequencies from 100 Hz to 1 kHz in these regions are consistent with waveguide calculations based on depth and frequency dependent sound speed, attenuation and density profiles. Theoretical predictions for the frequency dependence of attenuation vary from quadratic for the porous media model of M.A. Biot to linear for various competing models. Results from experiments performed under known conditions with sandy bottoms, however, have agreed with attenuation proportional to f1.84, which is slightly less than the theoretical value of f2 [Zhou and Zhang, J. Acoust. Soc. Am. 117, 2494]. This dissertation presents a reexamination of the fundamental considerations in the Biot derivation and leads to a simplification of the theory that can be coupled with site-specific, depth dependent attenuation and sound speed profiles to explain the observed frequency dependence. Long-range sound transmission measurements in a known waveguide can be used to estimate the site-specific sediment attenuation properties, but the costs and time associated with such at-sea experiments using traditional measurement techniques can be prohibitive. Here a new measurement tool consisting of an autonomous underwater vehicle and a small, low noise, towed hydrophone array was developed and used to obtain accurate long-range sound transmission measurements efficiently and cost effectively. To demonstrate this capability and to determine the modal and intrinsic attenuation characteristics, experiments were conducted in a carefully surveyed area in Nantucket Sound. A best-fit comparison between measured results and calculated results, while varying attenuation parameters, revealed the estimated power law exponent to be 1.87 between 220.5 and 1228 Hz. These results demonstrate the utility of this new cost effective and accurate measurement system. The sound transmission results, when compared with calculations based on the modified Biot theory, are shown to explain the observed frequency dependence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The algorithm presented in this paper aims to segment the foreground objects in video (e.g., people) given time-varying, textured backgrounds. Examples of time-varying backgrounds include waves on water, clouds moving, trees waving in the wind, automobile traffic, moving crowds, escalators, etc. We have developed a novel foreground-background segmentation algorithm that explicitly accounts for the non-stationary nature and clutter-like appearance of many dynamic textures. The dynamic texture is modeled by an Autoregressive Moving Average Model (ARMA). A robust Kalman filter algorithm iteratively estimates the intrinsic appearance of the dynamic texture, as well as the regions of the foreground objects. Preliminary experiments with this method have demonstrated promising results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a tool called Gismo (Generator of Internet Streaming Media Objects and workloads). Gismo enables the specification of a number of streaming media access characteristics, including object popularity, temporal correlation of request, seasonal access patterns, user session durations, user interactivity times, and variable bit-rate (VBR) self-similarity and marginal distributions. The embodiment of these characteristics in Gismo enables the generation of realistic and scalable request streams for use in the benchmarking and comparative evaluation of Internet streaming media delivery techniques. To demonstrate the usefulness of Gismo, we present a case study that shows the importance of various workload characteristics in determining the effectiveness of proxy caching and server patching techniques in reducing bandwidth requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Partial occlusions are commonplace in a variety of real world computer vision applications: surveillance, intelligent environments, assistive robotics, autonomous navigation, etc. While occlusion handling methods have been proposed, most methods tend to break down when confronted with numerous occluders in a scene. In this paper, a layered image-plane representation for tracking people through substantial occlusions is proposed. An image-plane representation of motion around an object is associated with a pre-computed graphical model, which can be instantiated efficiently during online tracking. A global state and observation space is obtained by linking transitions between layers. A Reversible Jump Markov Chain Monte Carlo approach is used to infer the number of people and track them online. The method outperforms two state-of-the-art methods for tracking over extended occlusions, given videos of a parking lot with numerous vehicles and a laboratory with many desks and workstations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a multi-object multi-camera framework for tracking large numbers of tightly-spaced objects that rapidly move in three dimensions. We formulate the problem of finding correspondences across multiple views as a multidimensional assignment problem and use a greedy randomized adaptive search procedure to solve this NP-hard problem efficiently. To account for occlusions, we relax the one-to-one constraint that one measurement corresponds to one object and iteratively solve the relaxed assignment problem. After correspondences are established, object trajectories are estimated by stereoscopic reconstruction using an epipolar-neighborhood search. We embedded our method into a tracker-to-tracker multi-view fusion system that not only obtains the three-dimensional trajectories of closely-moving objects but also accurately settles track uncertainties that could not be resolved from single views due to occlusion. We conducted experiments to validate our greedy assignment procedure and our technique to recover from occlusions. We successfully track hundreds of flying bats and provide an analysis of their group behavior based on 150 reconstructed 3D trajectories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A mechanism is proposed that integrates low-level (image processing), mid-level (recursive 3D trajectory estimation), and high-level (action recognition) processes. It is assumed that the system observes multiple moving objects via a single, uncalibrated video camera. A novel extended Kalman filter formulation is used in estimating the relative 3D motion trajectories up to a scale factor. The recursive estimation process provides a prediction and error measure that is exploited in higher-level stages of action recognition. Conversely, higher-level mechanisms provide feedback that allows the system to reliably segment and maintain the tracking of moving objects before, during, and after occlusion. The 3D trajectory, occlusion, and segmentation information are utilized in extracting stabilized views of the moving object. Trajectory-guided recognition (TGR) is proposed as a new and efficient method for adaptive classification of action. The TGR approach is demonstrated using "motion history images" that are then recognized via a mixture of Gaussian classifier. The system was tested in recognizing various dynamic human outdoor activities; e.g., running, walking, roller blading, and cycling. Experiments with synthetic data sets are used to evaluate stability of the trajectory estimator with respect to noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Much work on the performance of Web proxy caching has focused on high-level metrics such as hit rate and byte hit rate, but has ignored all the information related to the cachability of Web objects. Uncachable objects include those fetched by dynamic requests, objects with uncachable HTTP status code, objects with the uncachable HTTP header, objects with an HTTP 1.0 cookie, and objects without a last-modified header. Although some researchers filter the Web traces before they use them for analysis or simulation,many do not have a comprehensive understanding of the cachability of Web objects. In this paper we evaluate all the reasons that a Web object might be uncachable. We use traces from NLANR. Since these traces do not contain HTTP header information, we replay them using request generator to get the response header information. We find that between 15% and 40% of Web objects in our traces can not be cached by a Web proxy server. We use a LRU simulator to show the performance gap when the cachability is either considered or not. We show the characteristics of the cachable data set and find that all its characteristics are fairly similar to that of total data set. Finally, we present some additional results for the cachable and total data set: (1) The main reasons for uncachability are: dynamic requests, responses without last-modified header, responses with HTTP "302 Moved Temporarily" status code, and responses with a HTTP/1.0 cookie. (2) The cachability of Web objects can not be ignored in simulation because uncachable objects comprise a huge percentage of the total trace. Simulations without cachability consideration will be misleading.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A model of laminar visual cortical dynamics proposes how 3D boundary and surface representations of slated and curved 3D objects and 2D images arise. The 3D boundary representations emerge from interactions between non-classical horizontal receptive field interactions with intracorticcal and intercortical feedback circuits. Such non-classical interactions contextually disambiguate classical receptive field responses to ambiguous visual cues using cells that are sensitive to angles and disparity gradients with cortical areas V1 and V2. These cells are all variants of bipole grouping cells. Model simulations show how horizontal connections can develop selectively to angles, how slanted surfaces can activate 3D boundary representations that are sensitive to angles and disparity gradients, how 3D filling-in occurs across slanted surfaces, how a 2D Necker cube image can be represented in 3D, and how bistable Necker cuber percepts occur. The model also explains data about slant aftereffects and 3D neon color spreading. It shows how habituative transmitters that help to control developement also help to trigger bistable 3D percepts and slant aftereffects, and how attention can influence which of these percepts is perceived by propogating along some object boundaries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article develops a neural model of how the visual system processes natural images under variable illumination conditions to generate surface lightness percepts. Previous models have clarified how the brain can compute the relative contrast of images from variably illuminate scenes. How the brain determines an absolute lightness scale that "anchors" percepts of surface lightness to us the full dynamic range of neurons remains an unsolved problem. Lightness anchoring properties include articulation, insulation, configuration, and are effects. The model quantatively simulates these and other lightness data such as discounting the illuminant, the double brilliant illusion, lightness constancy and contrast, Mondrian contrast constancy, and the Craik-O'Brien-Cornsweet illusion. The model also clarifies the functional significance for lightness perception of anatomical and neurophysiological data, including gain control at retinal photoreceptors, and spatioal contrast adaptation at the negative feedback circuit between the inner segment of photoreceptors and interacting horizontal cells. The model retina can hereby adjust its sensitivity to input intensities ranging from dim moonlight to dazzling sunlight. A later model cortical processing stages, boundary representations gate the filling-in of surface lightness via long-range horizontal connections. Variants of this filling-in mechanism run 100-1000 times faster than diffusion mechanisms of previous biological filling-in models, and shows how filling-in can occur at realistic speeds. A new anchoring mechanism called the Blurred-Highest-Luminance-As-White (BHLAW) rule helps simulate how surface lightness becomes sensitive to the spatial scale of objects in a scene. The model is also able to process natural images under variable lighting conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article applies a recent theory of 3-D biological vision, called FACADE Theory, to explain several percepts which Kanizsa pioneered. These include 3-D pop-out of an occluding form in front of an occluded form, leading to completion and recognition of the occluded form; 3-D transparent and opaque percepts of Kanizsa squares, with and without Varin wedges; and interactions between percepts of illusory contours, brightness, and depth in response to 2-D Kanizsa images. These explanations clarify how a partially occluded object representation can be completed for purposes of object recognition, without the completed part of the representation necessarily being seen. The theory traces these percepts to neural mechanisms that compensate for measurement uncertainty and complementarity at individual cortical processing stages by using parallel and hierarchical interactions among several cortical processing stages. These interactions are modelled by a Boundary Contour System (BCS) that generates emergent boundary segmentations and a complementary Feature Contour System (FCS) that fills-in surface representations of brightness, color, and depth. The BCS and FCS interact reciprocally with an Object Recognition System (ORS) that binds BCS boundary and FCS surface representations into attentive object representations. The BCS models the parvocellular LGN→Interblob→Interstripe→V4 cortical processing stream, the FCS models the parvocellular LGN→Blob→Thin Stripe→V4 cortical processing stream, and the ORS models inferotemporal cortex.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved Boundary Contour System (BCS) and Feature Contour System (FCS) neural network model of preattentive vision is applied to large images containing range data gathered by a synthetic aperture radar (SAR) sensor. The goal of processing is to make structures such as motor vehicles, roads, or buildings more salient and more interpretable to human observers than they are in the original imagery. Early processing by shunting center-surround networks compresses signal dynamic range and performs local contrast enhancement. Subsequent processing by filters sensitive to oriented contrast, including short-range competition and long-range cooperation, segments the image into regions. The segmentation is performed by three "copies" of the BCS and FCS, of small, medium, and large scales, wherein the "short-range" and "long-range" interactions within each scale occur over smaller or larger distances, corresponding to the size of the early filters of each scale. A diffusive filling-in operation within the segmented regions at each scale produces coherent surface representations. The combination of BCS and FCS helps to locate and enhance structure over regions of many pixels, without the resulting blur characteristic of approaches based on low spatial frequency filtering alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Boundary Contour System neural vision model reproduces perceptual illusory boundary formation by a conjunctive boundary completion process within a large cellular receptive field. The conjunctive chain allows the same kind of conjunction to occur across multiple receptive fields, which allows for sharper, more flexible boundary completion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An extension to the Boundary Contour System model is proposed to account for boundary completion through vertices with arbitrary numbers of orientations, in a manner consistent with psychophysical observartions, by way of harmonic resonance in a neural architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual search data are given a unified quantitative explanation by a model of how spatial maps in the parietal cortex and object recognition categories in the inferotemporal cortex deploy attentional resources as they reciprocally interact with visual representations in the prestriate cortex. The model visual representations arc organized into multiple boundary and surface representations. Visual search in the model is initiated by organizing multiple items that lie within a given boundary or surface representation into a candidate search grouping. These items arc compared with object recognition categories to test for matches or mismatches. Mismatches can trigger deeper searches and recursive selection of new groupings until a target object io identified. This search model is algorithmically specified to quantitatively simulate search data using a single set of parameters, as well as to qualitatively explain a still larger data base, including data of Aks and Enns (1992), Bravo and Blake (1990), Chellazzi, Miller, Duncan, and Desimone (1993), Egeth, Viri, and Garbart (1984), Cohen and Ivry (1991), Enno and Rensink (1990), He and Nakayarna (1992), Humphreys, Quinlan, and Riddoch (1989), Mordkoff, Yantis, and Egeth (1990), Nakayama and Silverman (1986), Treisman and Gelade (1980), Treisman and Sato (1990), Wolfe, Cave, and Franzel (1989), and Wolfe and Friedman-Hill (1992). The model hereby provides an alternative to recent variations on the Feature Integration and Guided Search models, and grounds the analysis of visual search in neural models of preattentive vision, attentive object learning and categorization, and attentive spatial localization and orientation.