972 resultados para Automatic Image Annotation
Resumo:
In recent years there is an apparent shift in research from content based image retrieval (CBIR) to automatic image annotation in order to bridge the gap between low level features and high level semantics of images. Automatic Image Annotation (AIA) techniques facilitate extraction of high level semantic concepts from images by machine learning techniques. Many AIA techniques use feature analysis as the first step to identify the objects in the image. However, the high dimensional image features make the performance of the system worse. This paper describes and evaluates an automatic image annotation framework which uses SURF descriptors to select right number of features and right features for annotation. The proposed framework uses a hybrid approach in which k-means clustering is used in the training phase and fuzzy K-NN classification in the annotation phase. The performance of the system is evaluated using standard metrics.
Resumo:
Dissertação apresentada para obtenção do Grau de Mestre em Engenharia Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
Photo annotation is a resource-intensive task, yet is increasingly essential as image archives and personal photo collections grow in size. There is an inherent con?ict in the process of describing and archiving personal experiences, because casual users are generally unwilling to expend large amounts of e?ort on creating the annotations which are required to organise their collections so that they can make best use of them. This paper describes the Photocopain system, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend.
Resumo:
With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).
Resumo:
Personal memories composed of digital pictures are very popular at the moment. To retrieve these media items annotation is required. During the last years, several approaches have been proposed in order to overcome the image annotation problem. This paper presents our proposals to address this problem. Automatic and semi-automatic learning methods for semantic concepts are presented. The automatic method is based on semantic concepts estimated using visual content, context metadata and audio information. The semi-automatic method is based on results provided by a computer game. The paper describes our proposals and presents their evaluations.
Resumo:
El objetivo de PANACEA es engranar diferentes herramientas avanzadas para construir una fábrica de Recursos Lingüísticos (RL), una línea de producción que automatice los pasos implicados en la adquisición, producción, actualización y mantenimiento de los RL que la Traducción Automática y otras tecnologías lingüísticas, necesitan.
Resumo:
The objective of PANACEA is to build a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems and by other applications based on language technologies, and simplifies eventual issues regarding intellectual property rights. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time are the only way to guarantee the continuous supply of LRs that MT and other language technologies will be demanding in the multilingual Europe.
Resumo:
The Saimaa ringed seal is one of the most endangered seals in the world. It is a symbol of Lake Saimaa and a lot of effort have been applied to save it. Traditional methods of seal monitoring include capturing the animals and installing sensors on their bodies. These invasive methods for identifying can be painful and affect the behavior of the animals. Automatic identification of seals using computer vision provides a more humane method for the monitoring. This Master's thesis focuses on automatic image-based identification of the Saimaa ringed seals. This consists of detection and segmentation of a seal in an image, analysis of its ring patterns, and identification of the detected seal based on the features of the ring patterns. The proposed algorithm is evaluated with a dataset of 131 individual seals. Based on the experiments with 363 images, 81\% of the images were successfully segmented automatically. Furthermore, a new approach for interactive identification of Saimaa ringed seals is proposed. The results of this research are a starting point for future research in the topic of seal photo-identification.
Resumo:
Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal
Resumo:
An entropy-based image segmentation approach is introduced and applied to color images obtained from Google Earth. Segmentation refers to the process of partitioning a digital image in order to locate different objects and regions of interest. The application to satellite images paves the way to automated monitoring of ecological catastrophes, urban growth, agricultural activity, maritime pollution, climate changing and general surveillance. Regions representing aquatic, rural and urban areas are identified and the accuracy of the proposed segmentation methodology is evaluated. The comparison with gray level images revealed that the color information is fundamental to obtain an accurate segmentation. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In order to address problems of information overload in digital imagery task domains we have developed an interactive approach to the capture and reuse of image context information. Our framework models different aspects of the relationship between images and domain tasks they support by monitoring the interactive manipulation and annotation of task-relevant imagery. The approach allows us to gauge a measure of a user's intentions as they complete goal-directed image tasks. As users analyze retrieved imagery their interactions are captured and an expert task context is dynamically constructed. This human expertise, proficiency, and knowledge can then be leveraged to support other users in carrying out similar domain tasks. We have applied our techniques to two multimedia retrieval applications for two different image domains, namely the geo-spatial and medical imagery domains. © Springer-Verlag Berlin Heidelberg 2007.
Resumo:
The mismatch between human capacity and the acquisition of Big Data such as Earth imagery undermines commitments to Convention on Biological Diversity (CBD) and Aichi targets. Artificial intelligence (AI) solutions to Big Data issues are urgently needed as these could prove to be faster, more accurate, and cheaper. Reducing costs of managing protected areas in remote deep waters and in the High Seas is of great importance, and this is a realm where autonomous technology will be transformative.
Resumo:
Se comenzó el trabajo recabando información sobre los distintos enfoques que se le había dado a la anotación a lo largo del tiempo, desde anotación de imágenes a mano, pasando por anotación de imágenes utilizando características de bajo nivel, como color y textura, hasta la anotación automática. Tras entrar en materia, se procedió a estudiar artículos relativos a los diferentes algoritmos utilizados para la anotación automática de imágenes. Dado que la anotación automática es un campo bastante abierto, hay un gran numero de enfoques. Teniendo las características de las imágenes en particular en las que se iba a centrar el proyecto, se fueron descartando los poco idoneos, bien por un coste computacional elevado, o porque estaba centrado en un tipo diferente de imágenes, entre otras cosas. Finalmente, se encontró un algoritmo basado en formas (Active Shape Model) que se consideró que podría funcionar adecuadamente. Básicamente, los diferentes objetos de la imagen son identicados a partir de un contorno base generado a partir de imágenes de muestra, siendo modicado automáticamente para cubrir la zona deseada. Dado que las imágenes usadas son todas muy similares en composición, se cree que puede funcionar bien. Se partió de una implementación del algoritmo programada en MATLAB. Para empezar, se obtuvieron una serie de radiografías del tórax ya anotadas. Las imágenes contenían datos de contorno para ambos pulmones, las dos clavículas y el corazón. El primer paso fue la creación de una serie de scripts en MATLAB que permitieran: - Leer y transformar las imágenes recibidas en RAW, para adaptarlas al tamaño y la posición de los contornos anotados - Leer los archivos de texto con los datos de los puntos del contorno y transformarlos en variables de MATLAB - Unir la imagen transformada con los puntos y guardarla en un formato que la implementación del algoritmo entendiera. Tras conseguir los ficheros necesarios, se procedió a crear un modelo para cada órgano utilizando para el entrenamiento una pequeña parte de las imágenes. El modelo obtenido se probó con varias imágenes de las restantes. Sin embargo, se encontro bastante variación dependiendo de la imagen utilizada y el órgano detectado. ---ABSTRACT---The project was started by procuring information about the diferent approaches to image annotation over time, from manual image anotation to automatic annotation. The next step was to study several articles about the diferent algorithms used for automatic image annotation. Given that automatic annotation is an open field, there is a great number of approaches. Taking into account the features of the images that would be used, the less suitable algorithms were rejected. Eventually, a shape-based algorithm (Active Shape Model) was found. Basically, the diferent objects in the image are identified from a base contour, which is generated from training images. Then this contour is automatically modified to cover the desired area. Given that all the images that would be used are similar in object placement, the algorithm would probably work nicely. The work started from a MATLAB implementation of the algorithm. To begin with, a set of chest radiographs already annotated were obtained. These images came with contour data for both lungs, both clavicles and the heart. The first step was the creation of a series of MATLAB scripts to join the RAW images with the annotation data and transform them into a format that the algorithm could read. After obtaining the necessary files, a model for each organ was created using part of the images for training. The trained model was tested on several of the reimaining images. However, there was much variation in the results from one image to another. Generally, lungs were detected pretty accurately, whereas clavicles and the heart gave more problems. To improve the method, a new model was trained using half of the available images. With this model, a significant inprovement of the results can be seen.
Resumo:
Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.
Resumo:
Relevant past events can be remembered when visualizing related pictures. The main difficulty is how to find these photos in a large personal collection. Query definition and image annotation are key issues to overcome this problem. The former is relevant due to the diversity of the clues provided by our memory when recovering a past moment and the later because images need to be annotated with information regarding those clues to be retrieved. Consequently, tools to recover past memories should deal carefully with these two tasks. This paper describes a user interface designed to explore pictures from personal memories. Users can query the media collection in several ways and for this reason an iconic visual language to define queries is proposed. Automatic and semi-automatic annotation is also performed using the image content and the audio information obtained when users show their images to others. The paper also presents the user interface evaluation based on tests with 58 participants.