826 resultados para Image and video indexing
Resumo:
The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.
Resumo:
Inverse problems are at the core of many challenging applications. Variational and learning models provide estimated solutions of inverse problems as the outcome of specific reconstruction maps. In the variational approach, the result of the reconstruction map is the solution of a regularized minimization problem encoding information on the acquisition process and prior knowledge on the solution. In the learning approach, the reconstruction map is a parametric function whose parameters are identified by solving a minimization problem depending on a large set of data. In this thesis, we go beyond this apparent dichotomy between variational and learning models and we show they can be harmoniously merged in unified hybrid frameworks preserving their main advantages. We develop several highly efficient methods based on both these model-driven and data-driven strategies, for which we provide a detailed convergence analysis. The arising algorithms are applied to solve inverse problems involving images and time series. For each task, we show the proposed schemes improve the performances of many other existing methods in terms of both computational burden and quality of the solution. In the first part, we focus on gradient-based regularized variational models which are shown to be effective for segmentation purposes and thermal and medical image enhancement. We consider gradient sparsity-promoting regularized models for which we develop different strategies to estimate the regularization strength. Furthermore, we introduce a novel gradient-based Plug-and-Play convergent scheme considering a deep learning based denoiser trained on the gradient domain. In the second part, we address the tasks of natural image deblurring, image and video super resolution microscopy and positioning time series prediction, through deep learning based methods. We boost the performances of supervised, such as trained convolutional and recurrent networks, and unsupervised deep learning strategies, such as Deep Image Prior, by penalizing the losses with handcrafted regularization terms.
Resumo:
AMS Subj. Classification: H.3.7 Digital Libraries, K.6.5 Security and Protection
Resumo:
[ES]This paper describes an analysis performed for facial description in static images and video streams. The still image context is first analyzed in order to decide the optimal classifier configuration for each problem: gender recognition, race classification, and glasses and moustache presence. These results are later applied to significant samples which are automatically extracted in real-time from video streams achieving promising results in the facial description of 70 individuals by means of gender, race and the presence of glasses and moustache.
The Experience of the Religious through Silent Moving Image and the Silence of Bill Viola's Passions
Resumo:
With the creationof the moving image at the end of the 19th century a new way of representing and expressing the Religious was born. The cinema industry rapidly understood that film has a powerful way to attract new audiences and transformed the explicit religious message into an implicit theological discourse of the fictional film. Today, the concept of "cinema" needs to be rethought and expanded, as well as the notion of "tTranscendental" since the strong reality effect of the film can allow a true religious experience for the spectator.
Resumo:
A two terminal optically addressed image processing device based on two stacked sensing/switching p-i-n a-SiC:H diodes is presented. The charge packets are injected optically into the p-i-n sensing photodiode and confined at the illuminated regions changing locally the electrical field profile across the p-i-n switching diode. A red scanner is used for charge readout. The various design parameters and addressing architecture trade-offs are discussed. The influence on the transfer functions of an a-SiC:H sensing absorber optimized for red transmittance and blue collection or of a floating anode in between is analysed. Results show that the thin a-SiC:H sensing absorber confines the readout to the switching diode and filters the light allowing full colour detection at two appropriated voltages. When the floating anode is used the spectral response broadens, allowing B&W image recognition with improved light-to-dark sensitivity. A physical model supports the image and colour recognition process.
Resumo:
Large area hydrogenated amorphous silicon single and stacked p-i-n structures with low conductivity doped layers are proposed as monochrome and color image sensors. The layers of the structures are based on amorphous silicon alloys (a-Si(x)C(1-x):H). The current-voltage characteristics and the spectral sensitivity under different bias conditions are analyzed. The output characteristics are evaluated under different read-out voltages and scanner wavelengths. To extract information on image shape, intensity and color, a modulated light beam scans the sensor active area at three appropriate bias voltages and the photoresponse in each scanning position ("sub-pixel") is recorded. The investigation of the sensor output under different scanner wavelengths and varying electrical bias reveals that the response can be tuned, thus enabling color separation. The operation of the sensor is exemplified and supported by a numerical simulation.
Resumo:
he expansion of Digital Television and the convergence between conventional broadcasting and television over IP contributed to the gradual increase of the number of available channels and on demand video content. Moreover, the dissemination of the use of mobile devices like laptops, smartphones and tablets on everyday activities resulted in a shift of the traditional television viewing paradigm from the couch to everywhere, anytime from any device. Although this new scenario enables a great improvement in viewing experiences, it also brings new challenges given the overload of information that the viewer faces. Recommendation systems stand out as a possible solution to help a watcher on the selection of the content that best fits his/her preferences. This paper describes a web based system that helps the user navigating on broadcasted and online television content by implementing recommendations based on collaborative and content based filtering. The algorithms developed estimate the similarity between items and users and predict the rating that a user would assign to a particular item (television program, movie, etc.). To enable interoperability between different systems, programs characteristics (title, genre, actors, etc.) are stored according to the TV-Anytime standard. The set of recommendations produced are presented through a Web Application that allows the user to interact with the system based on the obtained recommendations.
Resumo:
Dissertação apresentada para obtenção do Grau de Mestre em Engenharia Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Management from the NOVA – School of Business and Economics
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Management from the NOVA – School of Business and Economics
Resumo:
This paper investigates the implications of individuals’ mistaken beliefs of their abilities on incentives in organizations using the principal-agent model of moral hazard. The paper shows that if effort is observable, then an agent’s mistaken beliefs about own ability are always favorable to the principal. However, if effort is unobservable, then an agent’s mistaken beliefs about own ability can be either favorable or unfavorable to the principal. The paper provides conditions under which an agent’s over estimation about own ability is favorable to the principal when effort is unobservable. Finally, the paper shows that workers’ mistaken beliefs about their coworkers’ abilities make interdependent incentive schemes more attractive to firms than individualistic incentive schemes.
Resumo:
PURPOSE: The goal of this study was to compare magnetic resonance enterography (MRE) and video capsule endoscopy (VCE) in suspected small bowel disease. MATERIALS AND METHODS: Nineteen patients with suspected small bowel disease participated in a prospective clinical comparison of MRE versus VCE. Both methods were evaluated separately and in conjunction with respect to a combined diagnostic endpoint based on clinical, laboratory, surgical, and histopathological findings. The Fisher's exact and j tests were used in comparing MRE and VCE. RESULTS: Small bowel pathologies were found in 15 out of 19 patients: Crohn's disease (n= 5), lymphoma (n= 4), lymphangioma (n= 1), adenocarcinoma (n= 1), postradiation enteropathy (n= 1), NSAID-induced enteropathy (n =1), angiodysplasia (n= 1), and small bowel adhesions (n= 1). VCE and MRE separately and in conjunction showed sensitivities of 92.9, 71.4, and 100% and specificities of 80, 60, and 80% (kappa= 0.73 vs. kappa = 0.29; P= 0.31/kappa = 0.85), respectively. In four patients, VCE depicted mucosal pathologies missed by MRE. MRE revealed 19 extraenteric findings in 11 patients as well as small bowel adhesions not detected on VCE (n= 1). CONCLUSION: VCE can readily depict and characterize subtle mucosal lesions missed at MRE, whereas MRE yields additional mural, perienteric, and extraenteric information. Thus, VCE and MRE appear to be complementary methods which, when used in conjunction, may better characterize suspected small bowel disease.
Resumo:
The project aims at advancing the state of the art in the use of context information for classification of image and video data. The use of context in the classification of images has been showed of great importance to improve the performance of actual object recognition systems. In our project we proposed the concept of Multi-scale Feature Labels as a general and compact method to exploit the local and global context. The feature extraction from the discriminative probability or classification confidence label field is of great novelty. Moreover the use of a multi-scale representation of the feature labels lead to a compact and efficient description of the context. The goal of the project has been also to provide a general-purpose method and prove its suitability in different image/video analysis problem. The two-year project generated 5 journal publications (plus 2 under submission), 10 conference publications (plus 2 under submission) and one patent (plus 1 pending). Of these publications, a relevant number make use of the main result of this project to improve the results in detection and/or segmentation of objects.