944 resultados para 3D object detection


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The world's largest fossil oyster reef, formed by the giant oyster Crassostrea gryphoides and located in Stetten (north of Vienna, Austria) is studied by Harzhauser et al., 2015, 2016; Djuricic et al., 2016. Digital documentation of the unique geological site is provided by terrestrial laser scanning (TLS) at the millimeter scale. Obtaining meaningful results is not merely a matter of data acquisition with a suitable device; it requires proper planning, data management, and postprocessing. Terrestrial laser scanning technology has a high potential for providing precise 3D mapping that serves as the basis for automatic object detection in different scenarios; however, it faces challenges in the presence of large amounts of data and the irregular geometry of an oyster reef. We provide a detailed description of the techniques and strategy used for data collection and processing in Djuricic et al., 2016. The use of laser scanning provided the ability to measure surface points of 46,840 (estimated) shells. They are up to 60-cm-long oyster specimens, and their surfaces are modeled with a high accuracy of 1 mm. In addition to laser scanning measurements, more than 300 photographs were captured, and an orthophoto mosaic was generated with a ground sampling distance (GSD) of 0.5 mm. This high-resolution 3D information and the photographic texture serve as the basis for ongoing and future geological and paleontological analyses. Moreover, they provide unprecedented documentation for conservation issues at a unique natural heritage site.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Extraction and reconstruction of rectal wall structures from an ultrasound image is helpful for surgeons in rectal clinical diagnosis and 3-D reconstruction of rectal structures from ultrasound images. The primary task is to extract the boundary of the muscular layers on the rectal wall. However, due to the low SNR from ultrasound imaging and the thin muscular layer structure of the rectum, this boundary detection task remains a challenge. An active contour model is an effective high-level model, which has been used successfully to aid the tasks of object representation and recognition in many image-processing applications. We present a novel multigradient field active contour algorithm with an extended ability for multiple-object detection, which overcomes some limitations of ordinary active contour models—"snakes." The core part in the algorithm is the proposal of multigradient vector fields, which are used to replace image forces in kinetic function for alternative constraints on the deformation of active contour, thereby partially solving the initialization limitation of active contour for rectal wall boundary detection. An adaptive expanding force is also added to the model to help the active contour go through the homogenous region in the image. The efficacy of the model is explained and tested on the boundary detection of a ring-shaped image, a synthetic image, and an ultrasound image. The experimental results show that the proposed multigradient field-active contour is feasible for multilayer boundary detection of rectal wall

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A real-time three-dimensional (3D) object sensing and reconstruction scheme is presented that can be applied on any arbitrary corporeal shape. Operation is demonstrated on several calibrated objects. The system uses curvature sensors based upon in-line fiber Bragg gratings encapsulated in a low-temperature curing synthetic silicone. New methods to quantitatively evaluate the performance of a 3D object-sensing scheme are developed and appraised. It is shown that the sensing scheme yields a volumetric error of 1% to 9%, depending on the object.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.

The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.

Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.

Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.

The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The first mechanical Automaton concept was found in a Chinese text written in the 3rd century BC, while Computer Vision was born in the late 1960s. Therefore, visual perception applied to machines (i.e. the Machine Vision) is a young and exciting alliance. When robots came in, the new field of Robotic Vision was born, and these terms began to be erroneously interchanged. In short, we can say that Machine Vision is an engineering domain, which concern the industrial use of Vision. The Robotic Vision, instead, is a research field that tries to incorporate robotics aspects in computer vision algorithms. Visual Servoing, for example, is one of the problems that cannot be solved by computer vision only. Accordingly, a large part of this work deals with boosting popular Computer Vision techniques by exploiting robotics: e.g. the use of kinematics to localize a vision sensor, mounted as the robot end-effector. The remainder of this work is dedicated to the counterparty, i.e. the use of computer vision to solve real robotic problems like grasping objects or navigate avoiding obstacles. Will be presented a brief survey about mapping data structures most widely used in robotics along with SkiMap, a novel sparse data structure created both for robotic mapping and as a general purpose 3D spatial index. Thus, several approaches to implement Object Detection and Manipulation, by exploiting the aforementioned mapping strategies, will be proposed, along with a completely new Machine Teaching facility in order to simply the training procedure of modern Deep Learning networks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Correctness of information gathered in production environments is an essential part of quality assurance processes in many industries, this task is often performed by human resources who visually take annotations in various steps of the production flow. Depending on the performed task the correlation between where exactly the information is gathered and what it represents is more than often lost in the process. The lack of labeled data places a great boundary on the application of deep neural networks aimed at object detection tasks, moreover supervised training of deep models requires a great amount of data to be available. Reaching an adequate large collection of labeled images through classic techniques of data annotations is an exhausting and costly task to perform, not always suitable for every scenario. A possible solution is to generate synthetic data that replicates the real one and use it to fine-tune a deep neural network trained on one or more source domains to a different target domain. The purpose of this thesis is to show a real case scenario where the provided data were both in great scarcity and missing the required annotations. Sequentially a possible approach is presented where synthetic data has been generated to address those issues while standing as a training base of deep neural networks for object detection, capable of working on images taken in production-like environments. Lastly, it compares performance on different types of synthetic data and convolutional neural networks used as backbones for the model.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

L'elaborato consiste in uno studio dello stato dell'arte della visual relationship detection. Partendo da una breve introduzione riguardante l'object detection e i privi lavori in cui sono state utilizzate le relazioni verranno affrontati i principali metodi utilizzati per risolvere il problema.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A instalação de sistemas de videovigilância, no interior ou exterior, em locais como aeroportos, centros comerciais, escritórios, edifícios estatais, bases militares ou casas privadas tem o intuito de auxiliar na tarefa de monitorização do local contra eventuais intrusos. Com estes sistemas é possível realizar a detecção e o seguimento das pessoas que se encontram no ambiente local, tornando a monitorização mais eficiente. Neste contexto, as imagens típicas (imagem natural e imagem infravermelha) são utilizadas para extrair informação dos objectos detectados e que irão ser seguidos. Contudo, as imagens convencionais são afectadas por condições ambientais adversas como o nível de luminosidade existente no local (luzes muito fortes ou escuridão total), a presença de chuva, de nevoeiro ou de fumo que dificultam a tarefa de monitorização das pessoas. Deste modo, tornou‐se necessário realizar estudos e apresentar soluções que aumentem a eficácia dos sistemas de videovigilância quando sujeitos a condições ambientais adversas, ou seja, em ambientes não controlados, sendo uma das soluções a utilização de imagens termográficas nos sistemas de videovigilância. Neste documento são apresentadas algumas das características das câmaras e imagens termográficas, assim como uma caracterização de cenários de vigilância. Em seguida, são apresentados resultados provenientes de um algoritmo que permite realizar a segmentação de pessoas utilizando imagens termográficas. O maior foco desta dissertação foi na análise dos modelos de descrição (Histograma de Cor, HOG, SIFT, SURF) para determinar o desempenho dos modelos em três casos: distinguir entre uma pessoa e um carro; distinguir entre duas pessoas distintas e determinar que é a mesma pessoa ao longo de uma sequência. De uma forma sucinta pretendeu‐se, com este estudo, contribuir para uma melhoria dos algoritmos de detecção e seguimento de objectos em sequências de vídeo de imagens termográficas. No final, através de uma análise dos resultados provenientes dos modelos de descrição, serão retiradas conclusões que servirão de indicação sobre qual o modelo que melhor permite discriminar entre objectos nas imagens termográficas.