950 resultados para Computer Vision for Robotics and Automation
Resumo:
Partial occlusions are commonplace in a variety of real world computer vision applications: surveillance, intelligent environments, assistive robotics, autonomous navigation, etc. While occlusion handling methods have been proposed, most methods tend to break down when confronted with numerous occluders in a scene. In this paper, a layered image-plane representation for tracking people through substantial occlusions is proposed. An image-plane representation of motion around an object is associated with a pre-computed graphical model, which can be instantiated efficiently during online tracking. A global state and observation space is obtained by linking transitions between layers. A Reversible Jump Markov Chain Monte Carlo approach is used to infer the number of people and track them online. The method outperforms two state-of-the-art methods for tracking over extended occlusions, given videos of a parking lot with numerous vehicles and a laboratory with many desks and workstations.
Resumo:
Accurate head tilt detection has a large potential to aid people with disabilities in the use of human-computer interfaces and provide universal access to communication software. We show how it can be utilized to tab through links on a web page or control a video game with head motions. It may also be useful as a correction method for currently available video-based assistive technology that requires upright facial poses. Few of the existing computer vision methods that detect head rotations in and out of the image plane with reasonable accuracy can operate within the context of a real-time communication interface because the computational expense that they incur is too great. Our method uses a variety of metrics to obtain a robust head tilt estimate without incurring the computational cost of previous methods. Our system runs in real time on a computer with a 2.53 GHz processor, 256 MB of RAM and an inexpensive webcam, using only 55% of the processor cycles.
Resumo:
Many people suffer from conditions that lead to deterioration of motor control and makes access to the computer using traditional input devices difficult. In particular, they may loose control of hand movement to the extent that the standard mouse cannot be used as a pointing device. Most current alternatives use markers or specialized hardware to track and translate a user's movement to pointer movement. These approaches may be perceived as intrusive, for example, wearable devices. Camera-based assistive systems that use visual tracking of features on the user's body often require cumbersome manual adjustment. This paper introduces an enhanced computer vision based strategy where features, for example on a user's face, viewed through an inexpensive USB camera, are tracked and translated to pointer movement. The main contributions of this paper are (1) enhancing a video based interface with a mechanism for mapping feature movement to pointer movement, which allows users to navigate to all areas of the screen even with very limited physical movement, and (2) providing a customizable, hierarchical navigation framework for human computer interaction (HCI). This framework provides effective use of the vision-based interface system for accessing multiple applications in an autonomous setting. Experiments with several users show the effectiveness of the mapping strategy and its usage within the application framework as a practical tool for desktop users with disabilities.
Resumo:
Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.
Resumo:
CONFIGR (CONtour FIgure GRound) is a computational model based on principles of biological vision that completes sparse and noisy image figures. Within an integrated vision/recognition system, CONFIGR posits an initial recognition stage which identifies figure pixels from spatially local input information. The resulting, and typically incomplete, figure is fed back to the “early vision” stage for long-range completion via filling-in. The reconstructed image is then re-presented to the recognition system for global functions such as object recognition. In the CONFIGR algorithm, the smallest independent image unit is the visible pixel, whose size defines a computational spatial scale. Once pixel size is fixed, the entire algorithm is fully determined, with no additional parameter choices. Multi-scale simulations illustrate the vision/recognition system. Open-source CONFIGR code is available online, but all examples can be derived analytically, and the design principles applied at each step are transparent. The model balances filling-in as figure against complementary filling-in as ground, which blocks spurious figure completions. Lobe computations occur on a subpixel spatial scale. Originally designed to fill-in missing contours in an incomplete image such as a dashed line, the same CONFIGR system connects and segments sparse dots, and unifies occluded objects from pieces locally identified as figure in the initial recognition stage. The model self-scales its completion distances, filling-in across gaps of any length, where unimpeded, while limiting connections among dense image-figure pixel groups that already have intrinsic form. Long-range image completion promises to play an important role in adaptive processors that reconstruct images from highly compressed video and still camera images.
Resumo:
A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.
Resumo:
This dissertation proposes and demonstrates novel smart modules to solve challenging problems in the areas of imaging, communications, and displays. The smartness of the modules is due to their ability to be able to adapt to changes in operating environment and application using programmable devices, specifically, electronically variable focus lenses (ECVFLs) and digital micromirror devices (DMD). The proposed modules include imagers for laser characterization and general purpose imaging which smartly adapt to changes in irradiance, optical wireless communication systems which can adapt to the number of users and to changes in link length, and a smart laser projection display that smartly adjust the pixel size to achieve a high resolution projected image at each screen distance. The first part of the dissertation starts with the proposal of using an ECVFL to create a novel multimode laser beam characterizer for coherent light. This laser beam characterizer uses the ECVFL and a DMD so that no mechanical motion of optical components along the optical axis is required. This reduces the mechanical motion overhead that traditional laser beam characterizers have, making this laser beam characterizer more accurate and reliable. The smart laser beam characterizer is able to account for irradiance fluctuations in the source. Using image processing, the important parameters that describe multimode laser beam propagation have been successfully extracted for a multi-mode laser test source. Specifically, the laser beam analysis parameters measured are the M2 parameter, w0 the minimum beam waist, and zR the Rayleigh range. Next a general purpose incoherent light imager that has a high dynamic range (>100 dB) and automatically adjusts for variations in irradiance in the scene is proposed. Then a data efficient image sensor is demonstrated. The idea of this smart image sensor is to reduce the bandwidth needed for transmitting data from the sensor by only sending the information which is required for the specific application while discarding the unnecessary data. In this case, the imager demonstrated sends only information regarding the boundaries of objects in the image so that after transmission to a remote image viewing location, these boundaries can be used to map out objects in the original image. The second part of the dissertation proposes and demonstrates smart optical communications systems using ECVFLs. This starts with the proposal and demonstration of a zero propagation loss optical wireless link using visible light with experiments covering a 1 to 4 m range. By adjusting the focal length of the ECVFLs for this directed line-of-sight link (LOS) the laser beam propagation parameters are adjusted such that the maximum amount of transmitted optical power is captured by the receiver for each link length. This power budget saving enables a longer achievable link range, a better SNR/BER, or higher power efficiency since more received power means the transmitted power can be reduced. Afterwards, a smart dual mode optical wireless link is proposed and demonstrated using a laser and LED coupled to the ECVFL to provide for the first time features of high bandwidths and wide beam coverage. This optical wireless link combines the capabilities of smart directed LOS link from the previous section with a diffuse optical wireless link, thus achieving high data rates and robustness to blocking. The proposed smart system can switch from LOS mode to Diffuse mode when blocking occurs or operate in both modes simultaneously to accommodate multiple users and operate a high speed link if one of the users requires extra bandwidth. The last part of this section presents the design of fibre optic and free-space optical switches which use ECVFLs to deflect the beams to achieve switching operation. These switching modules can be used in the proposed optical wireless indoor network. The final section of the thesis presents a novel smart laser scanning display. The ECVFL is used to create the smallest beam spot size possible for the system designed at the distance of the screen. The smart laser scanning display increases the spatial resoluti on of the display for any given distance. A basic smart display operation has been tested for red light and a 4X improvement in pixel resolution for the image has been demonstrated.
Resumo:
Gemstone Team MICE (Modifying and Improving Computer Ergonomics)
Resumo:
This paper presents a framework to integrate requirements management and design knowledge reuse. The research approach begins with a literature review in design reuse and requirements management to identify appropriate methods within each domain. A framework is proposed based on the identified requirements. The framework is then demonstrated using a case study example: vacuum pump design. Requirements are presented as a component of the integrated design knowledge framework. The proposed framework enables the application of requirements management as a dynamic process, including capture, analysis and recording of requirements. It takes account of the evolving requirements and the dynamic nature of the interaction between requirements and product structure through the various stages of product development.
Resumo:
Quantitative examination of prostate histology offers clues in the diagnostic classification of lesions and in the prediction of response to treatment and prognosis. To facilitate the collection of quantitative data, the development of machine vision systems is necessary. This study explored the use of imaging for identifying tissue abnormalities in prostate histology. Medium-power histological scenes were recorded from whole-mount radical prostatectomy sections at × 40 objective magnification and assessed by a pathologist as exhibiting stroma, normal tissue (nonneoplastic epithelial component), or prostatic carcinoma (PCa). A machine vision system was developed that divided the scenes into subregions of 100 × 100 pixels and subjected each to image-processing techniques. Analysis of morphological characteristics allowed the identification of normal tissue. Analysis of image texture demonstrated that Haralick feature 4 was the most suitable for discriminating stroma from PCa. Using these morphological and texture measurements, it was possible to define a classification scheme for each subregion. The machine vision system is designed to integrate these classification rules and generate digital maps of tissue composition from the classification of subregions; 79.3% of subregions were correctly classified. Established classification rates have demonstrated the validity of the methodology on small scenes; a logical extension was to apply the methodology to whole slide images via scanning technology. The machine vision system is capable of classifying these images. The machine vision system developed in this project facilitates the exploration of morphological and texture characteristics in quantifying tissue composition. It also illustrates the potential of quantitative methods to provide highly discriminatory information in the automated identification of prostatic lesions using computer vision.
Resumo:
The grading of crushed aggregate is carried out usually by sieving. We describe a new image-based approach to the automatic grading of such materials. The operational problem addressed is where the camera is located directly over a conveyor belt. Our approach characterizes the information content of each image, taking into account relative variation in the pixel data, and resolution scale. In feature space, we find very good class separation using a multidimensional linear classifier. The innovation in this work includes (i) introducing an effective image-based approach into this application area, and (ii) our supervised classification using wavelet entropy-based features.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.
Resumo:
Grey Level Co-occurrence Matrix (GLCM), one of the best known tool for texture analysis, estimates image properties related to second-order statistics. These image properties commonly known as Haralick texture features can be used for image classification, image segmentation, and remote sensing applications. However, their computations are highly intensive especially for very large images such as medical ones. Therefore, methods to accelerate their computations are highly desired. This paper proposes the use of programmable hardware to accelerate the calculation of GLCM and Haralick texture features. Further, as an example of the speedup offered by programmable logic, a multispectral computer vision system for automatic diagnosis of prostatic cancer has been implemented. The performance is then compared against a microprocessor based solution.