949 resultados para image representation
Resumo:
lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.
Resumo:
This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.
Resumo:
50 years after the birth of the Nouvelle Vague, the inheritance that the contemporary cinema receives from it is inevitable. Figures and visual motifs; stories, themes, faces or common places; aesthetic and language devices. The echoes appear in various ways, each affiliation involves a different relationship and therefore a dissimilar approximation to its analysis. And yet, both the academy and the film critics maintain their will to think the Nouvelle Vague as a whole, a universe, a stream or an aesthetic trend. However, does a Nouvelle Vague’s aesthetic exist? And if so: why and how to address their historical revision? Taking Deleuze’s thesis on the time-image and Serge Daney’s assertion according to which 50 years after the birth of the Nouvelle Vague, the inheritance that the contemporary cinema receives from it is inevitable. Figures and visual motifs; stories, themes, faces or common places; aesthetic and language devices. The echoes appear in various ways, each affiliation involves a different relationship and therefore a dissimilar approximation to its analysis. And yet, both the academy and the film critics maintain their will to think the Nouvelle Vague as a whole, a universe, a stream or an aesthetic trend. However, does a Nouvelle Vague’s aesthetic exist? And if so: why and how to address their historical revision? Taking Deleuze’s thesis on the time-image and Serge Daney’s assertion according to which
Resumo:
Today's man is socially absorbed by problematic body issues and everything that this means and involves. Literature, publicity, science, technology and medicine compound these issues in a form of this theme that has never been seen before. In the artistic framework, body image is constantly suffering modifications. Body image in sculpture unfolds itself, assuming different messages and different forms. The body is a synonym of subject, an infinite metaphorical history of our looks, desires, that leads one to interrogate his/her image and social and sexual relations. These are understood as a manifestation of individual desires freed from a moral and social imposition. It attempts a return to profound human nature before we are turned into a cloning industry. In thisstudy it isimportant for usto understand in which form doessculpture reflect body image as a sociocultural and psychological phenomenon within the coordinates of our time. To understand how they represent and what artists represent in sculpture as a multiple and complex structure of human sexuality. Today, the sculptural body, expanding its representation, no longer as a reproduction of the corporal characteristics, presents the body in what it possesses of most intimate, unique, human and real, that moves, reacts, feels, suffers and pulsates, a mirror of us all.
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.
Resumo:
Both ludic and macabre, the theatrical works of Samuel Beckett and Jean Genet are a paradox to behold. Indeed, as this thesis seeks to illustrate, despite their vastly differing aesthetics, at the core of each playwright’s stage productions is a tension between the characters’ yearning for silence and invisibility, and the continual creation of an often humorous, chaotic, exaggerated or theatrical image that depicts this very longing. Seeking an impossible intersection between their image and their death, they are trapped in a double bind that guarantees aesthetic failure. In order to grasp the close, yet delicate, relationship between the image of death and the death of the image, as presented in the plays of Beckett and Genet, we will explore how the characters’ creative processes deflate the very images — both visual and auditory — that they create. More specifically, we will examine how mimesis both liberates and confines the characters; while the symbolic realm provides the only means of self-representation, it is also a source of profound alienation and powerlessness, for it never adequately conveys meaning. Thus, body, gesture, language and voice are each the site of simultaneous and ceaseless reappearance and disappearance, for which death remains the only (aporetic) cure. Struggling against theatrical form, which demands the actors’ and the audience’s physical presence, both playwrights make shrewd use of metatheatre to slowly empty the stage and thereby suggest the impending, yet impossible, erasure of their characters.
Resumo:
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelligence. The challenge in modeling various complications involved in a language is the main motivation behind our work. Our main focus in this work is to develop conceptual graphical representation for image captions. We have used discourse representation structure to gain semantic information which is further modeled into a graphical structure. The effectiveness of the model is evaluated by a caption based image retrieval system. The image retrieval is performed by computing subgraph based similarity measures. Best retrievals were given an average rating of . ± . out of 4 by a group of 25 human judges. The experiments were performed on a subset of the SBU Captioned Photo Dataset. This purpose of this work is to establish the cognitive sensibility of the approach to caption representations
Resumo:
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelligence. The challenge in modeling various complications involved in a language is the main motivation behind our work. Our main focus in this work is to develop conceptual graphical representation for image captions. We have used discourse representation structure to gain semantic information which is further modeled into a graphical structure. The effectiveness of the model is evaluated by a caption based image retrieval system. The image retrieval is performed by computing subgraph based similarity measures. Best retrievals were given an average rating of . ± . out of 4 by a group of 25 human judges. The experiments were performed on a subset of the SBU Captioned Photo Dataset. This purpose of this work is to establish the cognitive sensibility of the approach to caption representations.