855 resultados para video object segmentation
Resumo:
In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.
Resumo:
Pedicle screw insertion technique has made revolution in the surgical treatment of spinal fractures and spinal disorders. Although X- ray fluoroscopy based navigation is popular, there is risk of prolonged exposure to X- ray radiation. Systems that have lower radiation risk are generally quite expensive. The position and orientation of the drill is clinically very important in pedicle screw fixation. In this paper, the position and orientation of the marker on the drill is determined using pattern recognition based methods, using geometric features, obtained from the input video sequence taken from CCD camera. A search is then performed on the video frames after preprocessing, to obtain the exact position and orientation of the drill. An animated graphics, showing the instantaneous position and orientation of the drill is then overlaid on the processed video for real time drill control and navigation
Resumo:
This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging. © 2011 IEEE.
Resumo:
Our objective for this thesis work was the deployment of a Neural Network based approach for video object detection on board a nano-drone. Furthermore, we have studied some possible extensions to exploit the temporal nature of videos to improve the detection capabilities of our algorithm. For our project, we have utilized the Mobilenetv2/v3SSDLite due to their limited computational and memory requirements. We have trained our networks on the IMAGENET VID 2015 dataset and to deploy it onto the nano-drone we have used the NNtool and Autotiler tools by GreenWaves. To exploit the temporal nature of video data we have tried different approaches: the introduction of an LSTM based convolutional layer in our architecture, the introduction of a Kalman filter based tracker as a postprocessing step to augment the results of our base architecture. We have obtain a total improvement in our performances of about 2.5 mAP with the Kalman filter based method(BYTE). Our detector run on a microcontroller class processor on board the nano-drone at 1.63 fps.
Resumo:
Nel TCR - Termina container Ravenna, è importante che nel momento di scarico del container sul camion non siano presenti persone nell’area. In questo elaborato si descrive la realizzazione e il funzionamento di un sistema di allarme automatico, in grado di rilevare persone ed eventualmente interrompere la procedura di scarico del container. Tale sistema si basa sulla tecnica della object segmentation tramite rimozione dello sfondo, a cui viene affiancata una classificazione e rimozione delle eventuali ombre con un metodo cromatico. Inoltre viene identificata la possibile testa di una persona e avendo a disposizione due telecamere, si mette in atto una visione binoculare per calcolarne l’altezza. Infine, viene presa in considerazione anche la dinamica del sistema, per cui la classificazione di una persona si può basare sulla grandezza, altezza e velocità dell’oggetto individuato.
Resumo:
A instalação de sistemas de videovigilância, no interior ou exterior, em locais como aeroportos, centros comerciais, escritórios, edifícios estatais, bases militares ou casas privadas tem o intuito de auxiliar na tarefa de monitorização do local contra eventuais intrusos. Com estes sistemas é possível realizar a detecção e o seguimento das pessoas que se encontram no ambiente local, tornando a monitorização mais eficiente. Neste contexto, as imagens típicas (imagem natural e imagem infravermelha) são utilizadas para extrair informação dos objectos detectados e que irão ser seguidos. Contudo, as imagens convencionais são afectadas por condições ambientais adversas como o nível de luminosidade existente no local (luzes muito fortes ou escuridão total), a presença de chuva, de nevoeiro ou de fumo que dificultam a tarefa de monitorização das pessoas. Deste modo, tornou‐se necessário realizar estudos e apresentar soluções que aumentem a eficácia dos sistemas de videovigilância quando sujeitos a condições ambientais adversas, ou seja, em ambientes não controlados, sendo uma das soluções a utilização de imagens termográficas nos sistemas de videovigilância. Neste documento são apresentadas algumas das características das câmaras e imagens termográficas, assim como uma caracterização de cenários de vigilância. Em seguida, são apresentados resultados provenientes de um algoritmo que permite realizar a segmentação de pessoas utilizando imagens termográficas. O maior foco desta dissertação foi na análise dos modelos de descrição (Histograma de Cor, HOG, SIFT, SURF) para determinar o desempenho dos modelos em três casos: distinguir entre uma pessoa e um carro; distinguir entre duas pessoas distintas e determinar que é a mesma pessoa ao longo de uma sequência. De uma forma sucinta pretendeu‐se, com este estudo, contribuir para uma melhoria dos algoritmos de detecção e seguimento de objectos em sequências de vídeo de imagens termográficas. No final, através de uma análise dos resultados provenientes dos modelos de descrição, serão retiradas conclusões que servirão de indicação sobre qual o modelo que melhor permite discriminar entre objectos nas imagens termográficas.
Resumo:
Image segmentation is the process of subdiving an image into constituent regions or objects that have similar features. In video segmentation, more than subdividing the frames in object that have similar features, there is a consistency requirement among segmentations of successive frames of the video. Fuzzy segmentation is a region growing technique that assigns to each element in an image (which may have been corrupted by noise and/or shading) a grade of membership between 0 and 1 to an object. In this work we present an application that uses a fuzzy segmentation algorithm to identify and select particles in micrographs and an extension of the algorithm to perform video segmentation. Here, we treat a video shot is treated as a three-dimensional volume with different z slices being occupied by different frames of the video shot. The volume is interactively segmented based on selected seed elements, that will determine the affinity functions based on their motion and color properties. The color information can be extracted from a specific color space or from three channels of a set of color models that are selected based on the correlation of the information from all channels. The motion information is provided into the form of dense optical flows maps. Finally, segmentation of real and synthetic videos and their application in a non-photorealistic rendering (NPR) toll are presented
Resumo:
Single shortest path extraction algorithms have been used in a number of areas such as network flow and image analysis. In image analysis, shortest path techniques can be used for object boundary detection, crack detection, or stereo disparity estimation. Sometimes one needs to find multiple paths as opposed to a single path in a network or an image where the paths must satisfy certain constraints. In this paper, we propose a new algorithm to extract multiple paths simultaneously within an image using a constrained expanded trellis (CET) for feature extraction and object segmentation. We also give a number of application examples for our multiple paths extraction algorithm.
Resumo:
The project aims at advancing the state of the art in the use of context information for classification of image and video data. The use of context in the classification of images has been showed of great importance to improve the performance of actual object recognition systems. In our project we proposed the concept of Multi-scale Feature Labels as a general and compact method to exploit the local and global context. The feature extraction from the discriminative probability or classification confidence label field is of great novelty. Moreover the use of a multi-scale representation of the feature labels lead to a compact and efficient description of the context. The goal of the project has been also to provide a general-purpose method and prove its suitability in different image/video analysis problem. The two-year project generated 5 journal publications (plus 2 under submission), 10 conference publications (plus 2 under submission) and one patent (plus 1 pending). Of these publications, a relevant number make use of the main result of this project to improve the results in detection and/or segmentation of objects.
Resumo:
This paper presents methods for moving object detection in airborne video surveillance. The motion segmentation in the above scenario is usually difficult because of small size of the object, motion of camera, and inconsistency in detected object shape etc. Here we present a motion segmentation system for moving camera video, based on background subtraction. An adaptive background building is used to take advantage of creation of background based on most recent frame. Our proposed system suggests CPU efficient alternative for conventional batch processing based background subtraction systems. We further refine the segmented motion by meanshift based mode association.
Resumo:
Wooden railway sleeper inspections in Sweden are currently performed manually by a human operator; such inspections are based on visual analysis. Machine vision based approach has been done to emulate the visual abilities of human operator to enable automation of the process. Through this process bad sleepers are identified, and a spot is marked on it with specific color (blue in the current case) on the rail so that the maintenance operators are able to identify the spot and replace the sleeper. The motive of this thesis is to help the operators to identify those sleepers which are marked by color (spots), using an “Intelligent Vehicle” which is capable of running on the track. Capturing video while running on the track and segmenting the object of interest (spot) through this vehicle; we can automate this work and minimize the human intuitions. The video acquisition process depends on camera position and source light to obtain fine brightness in acquisition, we have tested 4 different types of combinations (camera position and source light) here to record the video and test the validity of proposed method. A sequence of real time rail frames are extracted from these videos and further processing (depending upon the data acquisition process) is done to identify the spots. After identification of spot each frame is divided in to 9 regions to know the particular region where the spot lies to avoid overlapping with noise, and so on. The proposed method will generate the information regarding in which region the spot lies, based on nine regions in each frame. From the generated results we have made some classification regarding data collection techniques, efficiency, time and speed. In this report, extensive experiments using image sequences from particular camera are reported and the experiments were done using intelligent vehicle as well as test vehicle and the results shows that we have achieved 95% success in identifying the spots when we use video as it is, in other method were we can skip some frames in pre-processing to increase the speed of video but the segmentation results we reduced to 85% and the time was very less compared to previous one. This shows the validity of proposed method in identification of spots lying on wooden railway sleepers where we can compromise between time and efficiency to get the desired result.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.
Resumo:
The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.