848 resultados para Content-Based Image Retrieval (CBIR)
Resumo:
This paper describes the participation of DAEDALUS at ImageCLEF 2011 Medical Retrieval task. We have focused on multimodal (or mixed) experiments that combine textual and visual retrieval. The main objective of our research has been to evaluate the effect on the medical retrieval process of the existence of an extended corpus that is annotated with the image type, associated to both the image itself and also to its textual description. For this purpose, an image classifier has been developed to tag each document with its class (1st level of the hierarchy: Radiology, Microscopy, Photograph, Graphic, Other) and subclass (2nd level: AN, CT, MR, etc.). For the textual-based experiments, several runs using different semantic expansion techniques have been performed. For the visual-based retrieval, different runs are defined by the corpus used in the retrieval process and the strategy for obtaining the class and/or subclass. The best results are achieved in runs that make use of the image subclass based on the classification of the sample images. Although different multimodal strategies have been submitted, none of them has shown to be able to provide results that are at least comparable to the ones achieved by the textual retrieval alone. We believe that we have been unable to find a metric for the assessment of the relevance of the results provided by the visual and textual processes
Resumo:
Determination of the soil coverage by crop residues after ploughing is a fundamental element of Conservation Agriculture. This paper presents the application of genetic algorithms employed during the fine tuning of the segmentation process of a digital image with the aim of automatically quantifying the residue coverage. In other words, the objective is to achieve a segmentation that would permit the discrimination of the texture of the residue so that the output of the segmentation process is a binary image in which residue zones are isolated from the rest. The RGB images used come from a sample of images in which sections of terrain were photographed with a conventional camera positioned in zenith orientation atop a tripod. The images were taken outdoors under uncontrolled lighting conditions. Up to 92% similarity was achieved between the images obtained by the segmentation process proposed in this paper and the templates made by an elaborate manual tracing process. In addition to the proposed segmentation procedure and the fine tuning procedure that was developed, a global quantification of the soil coverage by residues for the sampled area was achieved that differed by only 0.85% from the quantification obtained using template images. Moreover, the proposed method does not depend on the type of residue present in the image. The study was conducted at the experimental farm “El Encín” in Alcalá de Henares (Madrid, Spain).
Resumo:
These slides present several 3-D reconstruction methods to obtain the geometric structure of a scene that is viewed by multiple cameras. We focus on the combination of the geometric modeling in the image formation process with the use of standard optimization tools to estimate the characteristic parameters that describe the geometry of the 3-D scene. In particular, linear, non-linear and robust methods to estimate the monocular and epipolar geometry are introduced as cornerstones to generate 3-D reconstructions with multiple cameras. Some examples of systems that use this constructive strategy are Bundler, PhotoSynth, VideoSurfing, etc., which are able to obtain 3-D reconstructions with several hundreds or thousands of cameras. En esta presentación se tratan varios métodos de reconstrucción 3-D para la obtención de la estructura geométrica de una escena que es visualizada por varias cámaras. Se enfatiza la combinación de modelado geométrico del proceso de formación de la imagen con el uso de herramientas estándar de optimización para estimar los parámetros característicos que describen la geometría de la escena 3-D. En concreto, se presentan métodos de estimación lineales, no lineales y robustos de las geometrías monocular y epipolar como punto de partida para generar reconstrucciones con tres o más cámaras. Algunos ejemplos de sistemas que utilizan este enfoque constructivo son Bundler, PhotoSynth, VideoSurfing, etc., los cuales, en la práctica pueden llegar a reconstruir una escena con varios cientos o miles de cámaras.
Resumo:
Stereo video techniques are effective for estimating the space–time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. We present an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea and near the southern seashore of the Crimean peninsula, in the Black Sea. We use classical epipolar techniques to reconstruct the sea surface from the stereo pairs sequentially in time, viz. a sequence of spatial snapshots. We also present a variational approach that exploits the entire data image set providing a global space–time imaging of the sea surface, viz. simultaneous reconstruction of several spatial snapshots of the surface in order to guarantee continuity of the sea surface both in space and time. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics at a point in time that agrees well with probabilistic models. In particular, WASS stereo imaging is able to capture typical features of the wave surface, especially the crest-to-trough asymmetry due to second order nonlinearities, and the observed shape of large waves are fairly described by theoretical models based on the theory of quasi-determinism (Boccotti, 2000). Further, we investigate space–time extremes of the observed stationary sea states, viz. the largest surface wave heights expected over a given area during the sea state duration. The WASS analysis provides the first experimental proof that a space–time extreme is generally larger than that observed in time via point measurements, in agreement with the predictions based on stochastic theories for global maxima of Gaussian fields.
Resumo:
ATM, SDH or satellite have been used in the last century as the contribution network of Broadcasters. However the attractive price of IP networks is changing the infrastructure of these networks in the last decade. Nowadays, IP networks are widely used, but their characteristics do not offer the level of performance required to carry high quality video under certain circumstances. Data transmission is always subject to errors on line. In the case of streaming, correction is attempted at destination, while on transfer of files, retransmissions of information are conducted and a reliable copy of the file is obtained. In the latter case, reception time is penalized because of the low priority this type of traffic on the networks usually has. While in streaming, image quality is adapted to line speed, and line errors result in a decrease of quality at destination, in the file copy the difference between coding speed vs line speed and errors in transmission are reflected in an increase of transmission time. The way news or audiovisual programs are transferred from a remote office to the production centre depends on the time window and the type of line available; in many cases, it must be done in real time (streaming), with the resulting image degradation. The main purpose of this work is the workflow optimization and the image quality maximization, for that reason a transmission model for multimedia files adapted to JPEG2000, is described based on the combination of advantages of file transmission and those of streaming transmission, putting aside the disadvantages that these models have. The method is based on two patents and consists of the safe transfer of the headers and data considered to be vital for reproduction. Aside, the rest of the data is sent by streaming, being able to carry out recuperation operations and error concealment. Using this model, image quality is maximized according to the time window. In this paper, we will first give a briefest overview of the broadcasters requirements and the solutions with IP networks. We will then focus on a different solution for video file transfer. We will take the example of a broadcast center with mobile units (unidirectional video link) and regional headends (bidirectional link), and we will also present a video file transfer file method that satisfies the broadcaster requirements.
Resumo:
A first study in order to construct a simple model of the mammalian retina is reported. The basic elements for this model are Optical Programmable Logic Cells, OPLCs, previously employed as a functional element for Optical Computing. The same type of circuit simulates the five types of neurons present in the retina. Different responses are obtained by modifying either internal or external connections. Two types of behaviors are reported: symmetrical and non-symmetrical with respect to light position. Some other higher functions, as the possibility to differentiate between symmetric and non-symmetric light images, are performed by another simulation of the first layers of the visual cortex. The possibility to apply these models to image processing is reported.
Resumo:
A proposal for a model of the primary visual cortex is reported. It is structured with the basis of a simple unit cell able to perform fourteen pairs of different boolean functions corresponding to the two possible inputs. As a first step, a model of the retina is presented. Different types of responses, according to the different possibilities of interconnecting the building blocks, have been obtained. These responses constitute the basis for an initial configuration of the mammalian primary visual cortex. Some qualitative functions, as symmetry or size of an optical input, have been obtained. A proposal to extend this model to some higher functions, concludes the paper.
Resumo:
Purpose: A fully three-dimensional (3D) massively parallelizable list-mode ordered-subsets expectation-maximization (LM-OSEM) reconstruction algorithm has been developed for high-resolution PET cameras. System response probabilities are calculated online from a set of parameters derived from Monte Carlo simulations. The shape of a system response for a given line of response (LOR) has been shown to be asymmetrical around the LOR. This work has been focused on the development of efficient region-search techniques to sample the system response probabilities, which are suitable for asymmetric kernel models, including elliptical Gaussian models that allow for high accuracy and high parallelization efficiency. The novel region-search scheme using variable kernel models is applied in the proposed PET reconstruction algorithm. Methods: A novel region-search technique has been used to sample the probability density function in correspondence with a small dynamic subset of the field of view that constitutes the region of response (ROR). The ROR is identified around the LOR by searching for any voxel within a dynamically calculated contour. The contour condition is currently defined as a fixed threshold over the posterior probability, and arbitrary kernel models can be applied using a numerical approach. The processing of the LORs is distributed in batches among the available computing devices, then, individual LORs are processed within different processing units. In this way, both multicore and multiple many-core processing units can be efficiently exploited. Tests have been conducted with probability models that take into account the noncolinearity, positron range, and crystal penetration effects, that produced tubes of response with varying elliptical sections whose axes were a function of the crystal's thickness and angle of incidence of the given LOR. The algorithm treats the probability model as a 3D scalar field defined within a reference system aligned with the ideal LOR. Results: This new technique provides superior image quality in terms of signal-to-noise ratio as compared with the histogram-mode method based on precomputed system matrices available for a commercial small animal scanner. Reconstruction times can be kept low with the use of multicore, many-core architectures, including multiple graphic processing units. Conclusions: A highly parallelizable LM reconstruction method has been proposed based on Monte Carlo simulations and new parallelization techniques aimed at improving the reconstruction speed and the image signal-to-noise of a given OSEM algorithm. The method has been validated using simulated and real phantoms. A special advantage of the new method is the possibility of defining dynamically the cut-off threshold over the calculated probabilities thus allowing for a direct control on the trade-off between speed and quality during the reconstruction.
Resumo:
Evolvable Hardware (EH) is a technique that consists of using reconfigurable hardware devices whose configuration is controlled by an Evolutionary Algorithm (EA). Our system consists of a fully-FPGA implemented scalable EH platform, where the Reconfigurable processing Core (RC) can adaptively increase or decrease in size. Figure 1 shows the architecture of the proposed System-on-Programmable-Chip (SoPC), consisting of a MicroBlaze processor responsible of controlling the whole system operation, a Reconfiguration Engine (RE), and a Reconfigurable processing Core which is able to change its size in both height and width. This system is used to implement image filters, which are generated autonomously thanks to the evolutionary process. The system is complemented with a camera that enables the usage of the platform for real time applications.
Resumo:
This paper presents a vision based autonomous landing control approach for unmanned aerial vehicles (UAV). The 3D position of an unmanned helicopter is estimated based on the homographies estimated of a known landmark. The translation and altitude estimation of the helicopter against the helipad position are the only information that is used to control the longitudinal, lateral and descend speeds of the vehicle. The control system approach consists in three Fuzzy controllers to manage the speeds of each 3D axis of the aircraft s coordinate system. The 3D position estimation was proven rst, comparing it with the GPS + IMU data with very good results. The robust of the vision algorithm against occlusions was also tested. The excellent behavior of the Fuzzy control approach using the 3D position estimation based in homographies was proved in an outdoors test using a real unmanned helicopter.
Resumo:
We present an adaptive unequal error protection (UEP) strategy built on the 1-D interleaved parity Application Layer Forward Error Correction (AL-FEC) code for protecting the transmission of stereoscopic 3D video content encoded with Multiview Video Coding (MVC) through IP-based networks. Our scheme targets the minimization of quality degradation produced by packet losses during video transmission in time-sensitive application scenarios. To that end, based on a novel packet-level distortion model, it selects in real time the most suitable packets within each Group of Pictures (GOP) to be protected and the most convenient FEC technique parameters, i.e., the size of the FEC generator matrix. In order to make these decisions, it considers the relevance of the packet, the behavior of the channel, and the available bitrate for protection purposes. Simulation results validate both the distortion model introduced to estimate the importance of packets and the optimization of the FEC technique parameter values.
Resumo:
This work proposes an optimization of a semi-supervised Change Detection methodology based on a combination of Change Indices (CI) derived from an image multitemporal data set. For this purpose, SPOT 5 Panchromatic images with 2.5 m spatial resolution have been used, from which three Change Indices have been calculated. Two of them are usually known indices; however the third one has been derived considering the Kullbak-Leibler divergence. Then, these three indices have been combined forming a multiband image that has been used in as input for a Support Vector Machine (SVM) classifier where four different discriminant functions have been tested in order to differentiate between change and no_change categories. The performance of the suggested procedure has been assessed applying different quality measures, reaching in each case highly satisfactory values. These results have demonstrated that the simultaneous combination of basic change indices with others more sophisticated like the Kullback-Leibler distance, and the application of non-parametric discriminant functions like those employees in the SVM method, allows solving efficiently a change detection problem.
Resumo:
Automatic 2D-to-3D conversion is an important application for filling the gap between the increasing number of 3D displays and the still scant 3D content. However, existing approaches have an excessive computational cost that complicates its practical application. In this paper, a fast automatic 2D-to-3D conversion technique is proposed, which uses a machine learning framework to infer the 3D structure of a query color image from a training database with color and depth images. Assuming that photometrically similar images have analogous 3D structures, a depth map is estimated by searching the most similar color images in the database, and fusing the corresponding depth maps. Large databases are desirable to achieve better results, but the computational cost also increases. A clustering-based hierarchical search using compact SURF descriptors to characterize images is proposed to drastically reduce search times. A significant computational time improvement has been obtained regarding other state-of-the-art approaches, maintaining the quality results.
Resumo:
Current fusion devices consist of multiple diagnostics and hundreds or even thousands of signals. This situation forces on multiple occasions to use distributed data acquisition systems as the best approach. In this type of distributed systems, one of the most important issues is the synchronization between signals, so that it is possible to have a temporal correlation as accurate as possible between the acquired samples of all channels. In last decades, many fusion devices use different types of video cameras to provide inside views of the vessel during operations and to monitor plasma behavior. The synchronization between each video frame and the rest of the different signals acquired from any other diagnostics is essential in order to know correctly the plasma evolution, since it is possible to analyze jointly all the information having accurate knowledge of their temporal correlation. The developed system described in this paper allows timestamping image frames in a real-time acquisition and processing system using 1588 clock distribution. The system has been implemented using FPGA based devices together with a 1588 synchronized timing card (see Fig.1). The solution is based on a previous system [1] that allows image acquisition and real-time image processing based on PXIe technology. This architecture is fully compatible with the ITER Fast Controllers [2] and offers integration with EPICS to control and monitor the entire system. However, this set-up is not able to timestamp the frames acquired since the frame grabber module does not present any type of timing input (IRIG-B, GPS, PTP). To solve this lack, an IEEE1588 PXI timing device its used to provide an accurate way to synchronize distributed data acquisition systems using the Precision Time Protocol (PTP) IEEE 1588 2008 standard. This local timing device can be connected to a master clock device for global synchronization. The timing device has a buffer timestamp for each PXI trigger line and requires tha- a software application assigns each frame the corresponding timestamp. The previous action is critical and cannot be achieved if the frame rate is high. To solve this problem, it has been designed a solution that distributes the clock from the IEEE 1588 timing card to all FlexRIO devices [3]. This solution uses two PXI trigger lines that provide the capacity to assign timestamps to every frame acquired and register events by hardware in a deterministic way. The system provides a solution for timestamping frames to synchronize them with the rest of the different signals.
Resumo:
Video analytics play a critical role in most recent traffic monitoring and driver assistance systems. In this context, the correct detection and classification of surrounding vehicles through image analysis has been the focus of extensive research in the last years. Most of the pieces of work reported for image-based vehicle verification make use of supervised classification approaches and resort to techniques, such as histograms of oriented gradients (HOG), principal component analysis (PCA), and Gabor filters, among others. Unfortunately, existing approaches are lacking in two respects: first, comparison between methods using a common body of work has not been addressed; second, no study of the combination potentiality of popular features for vehicle classification has been reported. In this study the performance of the different techniques is first reviewed and compared using a common public database. Then, the combination capabilities of these techniques are explored and a methodology is presented for the fusion of classifiers built upon them, taking into account also the vehicle pose. The study unveils the limitations of single-feature based classification and makes clear that fusion of classifiers is highly beneficial for vehicle verification.