3 resultados para Spatiotemporal shaping

em Boston University Digital Common


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spotting patterns of interest in an input signal is a very useful task in many different fields including medicine, bioinformatics, economics, speech recognition and computer vision. Example instances of this problem include spotting an object of interest in an image (e.g., a tumor), a pattern of interest in a time-varying signal (e.g., audio analysis), or an object of interest moving in a specific way (e.g., a human's body gesture). Traditional spotting methods, which are based on Dynamic Time Warping or hidden Markov models, use some variant of dynamic programming to register the pattern and the input while accounting for temporal variation between them. At the same time, those methods often suffer from several shortcomings: they may give meaningless solutions when input observations are unreliable or ambiguous, they require a high complexity search across the whole input signal, and they may give incorrect solutions if some patterns appear as smaller parts within other patterns. In this thesis, we develop a framework that addresses these three problems, and evaluate the framework's performance in spotting and recognizing hand gestures in video. The first contribution is a spatiotemporal matching algorithm that extends the dynamic programming formulation to accommodate multiple candidate hand detections in every video frame. The algorithm finds the best alignment between the gesture model and the input, and simultaneously locates the best candidate hand detection in every frame. This allows for a gesture to be recognized even when the hand location is highly ambiguous. The second contribution is a pruning method that uses model-specific classifiers to reject dynamic programming hypotheses with a poor match between the input and model. Pruning improves the efficiency of the spatiotemporal matching algorithm, and in some cases may improve the recognition accuracy. The pruning classifiers are learned from training data, and cross-validation is used to reduce the chance of overpruning. The third contribution is a subgesture reasoning process that models the fact that some gesture models can falsely match parts of other, longer gestures. By integrating subgesture reasoning the spotting algorithm can avoid the premature detection of a subgesture when the longer gesture is actually being performed. Subgesture relations between pairs of gestures are automatically learned from training data. The performance of the approach is evaluated on two challenging video datasets: hand-signed digits gestured by users wearing short sleeved shirts, in front of a cluttered background, and American Sign Language (ASL) utterances gestured by ASL native signers. The experiments demonstrate that the proposed method is more accurate and efficient than competing approaches. The proposed approach can be generally applied to alignment or search problems with multiple input observations, that use dynamic programming to find a solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes a nonlinear model of neural processing in the vertebrate retina, comprising model photoreceptors, model push-pull bipolar cells, and model ganglion cells. Previous analyses and simulations have shown that with a choice of parameters that mimics beta cells, the model exhibits X-like linear spatial summation (null response to contrast-reversed gratings) in spite of photoreceptor nonlinearities; on the other hand, a choice of parameters that mimics alpha cells leads to Y-like frequency doubling. This article extends the previous work by showing that the model can replicate qualitatively many of the original findings on X and Y cells with a fixed choice of parameters. The results generally support the hypothesis that X and Y cells can be seen as functional variants of a single neural circuit. The model also suggests that both depolarizing and hyperpolarizing bipolar cells converge onto both ON and OFF ganglion cell types. The push-pull connectivity enables ganglion cells to remain sensitive to deviations about the mean output level of nonlinear photoreceptors. These and other properties of the push-pull model are discussed in the general context of retinal processing of spatiotemporal luminance patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A computational model of visual processing in the vertebrate retina provides a unified explanation of a range of data previously treated by disparate models. Three results are reported here: the model proposes a functional explanation for the primary feed-forward retinal circuit found in vertebrate retinae, it shows how this retinal circuit combines nonlinear adaptation with the desirable properties of linear processing, and it accounts for the origin of parallel transient (nonlinear) and sustained (linear) visual processing streams as simple variants of the same retinal circuit. The retina, owing to its accessibility and to its fundamental role in the initial transduction of light into neural signals, is among the most extensively studied neural structures in the nervous system. Since the pioneering anatomical work by Ramón y Cajal at the turn of the last century[1], technological advances have abetted detailed descriptions of the physiological, pharmacological, and functional properties of many types of retinal cells. However, the relationship between structure and function in the retina is still poorly understood. This article outlines a computational model developed to address fundamental constraints of biological visual systems. Neurons that process nonnegative input signals-such as retinal illuminance-are subject to an inescapable tradeoff between accurate processing in the spatial and temporal domains. Accurate processing in both domains can be achieved with a model that combines nonlinear mechanisms for temporal and spatial adaptation within three layers of feed-forward processing. The resulting architecture is structurally similar to the feed-forward retinal circuit connecting photoreceptors to retinal ganglion cells through bipolar cells. This similarity suggests that the three-layer structure observed in all vertebrate retinae[2] is a required minimal anatomy for accurate spatiotemporal visual processing. This hypothesis is supported through computer simulations showing that the model's output layer accounts for many properties of retinal ganglion cells[3],[4],[5],[6]. Moreover, the model shows how the retina can extend its dynamic range through nonlinear adaptation while exhibiting seemingly linear behavior in response to a variety of spatiotemporal input stimuli. This property is the basis for the prediction that the same retinal circuit can account for both sustained (X) and transient (Y) cat ganglion cells[7] by simple morphological changes. The ability to generate distinct functional behaviors by simple changes in cell morphology suggests that different functional pathways originating in the retina may have evolved from a unified anatomy designed to cope with the constraints of low-level biological vision.