883 resultados para Multimodal retrieval
Resumo:
The purpose of this memorandum is to document the benefit-cost analysis of the river crossing concept alternatives described in the "Concept Alternatives Technical Memo." Benefit-cost studies are designed to measure, in dollars, the potential positive or negative impacts of large-scale construction projects. The concept alternatives analyzed include improvements to the Union Pacific Railroad (UPRR) River Crossing and the U.S. Highway 30 (U.S. 30) River Crossing.
Resumo:
Abstract not available
Resumo:
En este trabajo de fin de máster se investiga qué impacto tuvo la Marca Finlandia en el uso de las artes finlandesas en Focus – economía y tecnología, una revista publicada por el Ministerio de Asuntos Exteriores de Finlandia durante los años más profundos de la crisis económica 2008-2013. En 2008 el Ministro de Asuntos Exteriores de entonces, Alexander Stubb, designó una delegación para desarrollar la marca país de Finlandia, fruto de un proceso del que se presentó en 2010 un documento llamado Tehtävä Suomelle! (¡Misión para Finlandia!), cuyo objetivo fue concretar el diseño y las funciones de la Marca Finlandia. Este trabajo se enfoca en la utilización del arte y de artistas finlandeses en el material creado para apoyar la promoción de la imagen del país y, en última instancia, su competividad económica. Para tener en cuenta el contexto del material objeto de estudio, presentamos el concepto de la marca país y distintos planteamientos teóricos para su realización. Siendo una estrategia estatal polémica, recorremos los puntos de crítica más importantes sobre esta práctica promocional. También nos familiarizamos con la historia de la promoción nacional antes de introducir el presente proyecto, la Marca Finlandia, y sus objetivos. En la parte teórica se presentan los conceptos relevantes para el estudio: el discurso y su análisis crítico, enfatizando su capacidad ideológica para mantener y crear relaciones del poder muy a menudo desiguales. Ya que nuestro material de estudio empírico está compuesto de publicaciones físicas con dimensiones textuales, visuales y hápticas, nuestro método de investigación es observar críticamente estas relaciones desde la perspectiva multimodal. En la parte empírica del trabajo analizamos la composición de la comunicación multimodal del material imprimido, producido por el Ministerio de Asuntos Exteriores de Finlandia y sus socios de cooperación. Las revistas anuales Focus– Economía y tecnología se enfocan en noticias y artículos sobre los sectores de la economía y tecnología, pero también incluyen contenido sobre el diseño, la música, la moda, la arquitectura y la danza finlandeses. En nuestro análisis recorremos los campos verbales y visuales utilizados en las revistas para investigar de qué manera se presentaban las artes finlandesas en la operación de Marca Finlandia. Detectamos representaciones textuales, resaltaciones visuales y la combinación de ambos, lo que servía a las metas preterminadas y la imagen requerida por la delegación diseñadora de la marca país. Las revistas compartían el discurso en común que corría paralelo con los objetivos de la Marca Finlandia, que a su vez se puede ver como parte del discurso hegemónico neoliberal en general.
Resumo:
Published in Electronic Handling of Information: Testing and Evaluation, Kent, Taubee, Beltzer, and Goldstein (ed.), Academic Press, London(1967), pp. 123–147.
Resumo:
The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.
Resumo:
Users need to be able to address in-air gesture systems, which means finding where to perform gestures and how to direct them towards the intended system. This is necessary for input to be sensed correctly and without unintentionally affecting other systems. This thesis investigates novel interaction techniques which allow users to address gesture systems properly, helping them find where and how to gesture. It also investigates audio, tactile and interactive light displays for multimodal gesture feedback; these can be used by gesture systems with limited output capabilities (like mobile phones and small household controls), allowing the interaction techniques to be used by a variety of device types. It investigates tactile and interactive light displays in greater detail, as these are not as well understood as audio displays. Experiments 1 and 2 explored tactile feedback for gesture systems, comparing an ultrasound haptic display to wearable tactile displays at different body locations and investigating feedback designs. These experiments found that tactile feedback improves the user experience of gesturing by reassuring users that their movements are being sensed. Experiment 3 investigated interactive light displays for gesture systems, finding this novel display type effective for giving feedback and presenting information. It also found that interactive light feedback is enhanced by audio and tactile feedback. These feedback modalities were then used alongside audio feedback in two interaction techniques for addressing gesture systems: sensor strength feedback and rhythmic gestures. Sensor strength feedback is multimodal feedback that tells users how well they can be sensed, encouraging them to find where to gesture through active exploration. Experiment 4 found that they can do this with 51mm accuracy, with combinations of audio and interactive light feedback leading to the best performance. Rhythmic gestures are continuously repeated gesture movements which can be used to direct input. Experiment 5 investigated the usability of this technique, finding that users can match rhythmic gestures well and with ease. Finally, these interaction techniques were combined, resulting in a new single interaction for addressing gesture systems. Using this interaction, users could direct their input with rhythmic gestures while using the sensor strength feedback to find a good location for addressing the system. Experiment 6 studied the effectiveness and usability of this technique, as well as the design space for combining the two types of feedback. It found that this interaction was successful, with users matching 99.9% of rhythmic gestures, with 80mm accuracy from target points. The findings show that gesture systems could successfully use this interaction technique to allow users to address them. Novel design recommendations for using rhythmic gestures and sensor strength feedback were created, informed by the experiment findings.
Resumo:
The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches.
Resumo:
The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches.
Resumo:
The present thesis is a study of movie review entertainment (MRE) which is a contemporary Internet-based genre of texts. MRE are movie reviews in video form which are published online, usually as episodes of an MRE web show. Characteristic to MRE is combining humor and honest opinions in varying degrees as well as the use of subject materials, i.e. clips of the movies, as a part of the review. The study approached MRE from a linguistic perspective aiming to discover 1) whether MRE is primarily text- or image-based and what the primary functions of the modes are, 2) how a reviewer linguistically combines subject footage to her/his commentary?, 3) whether there is any internal variation in MRE regarding the aforementioned questions, and 4) how suitable the selected models and theories are in the analysis of this type of contemporary multimodal data. To answer the aforementioned questions, the multimodal system of image—text relations by Martinec and Salway (2005) in combination with categories of cohesion by Halliday and Hasan (1976) were applied to four full MRE videos which were transcribed in their entirety for the study. The primary data represent varying types of MRE: a current movie review, an analytic essay, a riff review, and a humorous essay. The results demonstrated that image vs. text prioritization can vary between reviews and also within a review. The current movie review and the two essays were primarily commentary-focused whereas the riff review was significantly more dependent on the use of imagery as the clips are a major source of humor which is a prominent value in that type of a review. In addition to humor, clips are used to exemplify the commentary. A reviewer also relates new information to the imagery as well as uses two modes to present the information in a review. Linguistically, the most frequent case was that the reviewer names participants and processes lexically in the commentary. Grammatical relations (reference items such as pronouns and adverbs and conjunctive items in the riff review) were also encountered. There was internal variation to a considerable degree. The methods chosen were deemed appropriate to answer the research questions. Further study could go beyond linguistics to include, for instance, genre and media studies.
Resumo:
International audience
Resumo:
Durante los ?ltimos a?os el an?lisis del discurso oral ha tomado un lugar importante en los estudios de la ling??stica y sobretodo en investigaciones enfocadas en el discurso en el aula. Su naturaleza en el habla permite obtener un gran abanico de elementos por analizar que marca la diferencia con el an?lisis del discurso escrito. Por otro lado, en la comunicaci?n oral el hablante est? pendiente de lo que dice con las palabras, pero no controla de la misma manera los gestos. Aspectos como la expresi?n del rostro, la orientaci?n corporal, la direcci?n de la mirada, etc., enfatizan lo que se quiere expresar en el discurso verbal. As?, los enunciados son bimodales ya que utilizan tanto la modalidad auditivo-vocal como la viso-gestual. Sin embargo, este estudio caracteriza este tipo de comunicaci?n on l t rm no ?mult mo l? o l o qu n l s urso or l l orpus que se analiza hay un canal extra de comunicaci?n como lo es el uso del tablero. Esta exploraci?n le da valor a los gestos en el ?mbito pedag?gico, pues el lenguaje no verbal puede reforzar o sustituir la expresi?n verbal y puede llegar a tener una gran injerencia en la construcci?n del sentido que los alumnos elaboran en torno a la comunicaci?n transmitida por el profesor. Teniendo en cuenta que la lecci?n es una unidad textual, esta investigaci?n busca observar y analizar c?mo los gestos contribuyen a la estructuraci?n del discurso en el aula, es decir, cu?l ser?a la marcaci?n multimodal de la estructura de la lecci?n. Se exploran dos temas centrales: el an?lisis del discurso en el aula y la gestualidad que acompa?a al habla. Para ello se utiliz? el modelo de an?lisis del discurso oral de Sinclair y Coulthard (1992) que tiene como elementos principales la lecci?n, las transacciones y los intercambios, y el modelo de an?lisis de gesticulaciones de McNeill (1998, 2005) que propone cinco dimensiones de gesticulaciones: gestos r?tmicos, gestos de?cticos, gestos ic?nicos, gestos metaf?ricos y gestos cohesivos.Se busc? determinar el papel de los gestos como marcadores o reforzadores de la estructura del discurso oral de una clase de Matem?ticas en ingl?s, dada por un docente nativo para quinto grado en un contexto educativo biling?e. Se hizo una grabaci?n de esa clase de 45 minutos, se procedi? a transcribirla en un formato de dos columnas: Profesor (con los enunciados del profesor) y Estudiantes (con las intervenciones de los estudiantes). Se observ? que el estudio del discurso oral en el aula se enriquece cuando se le a?aden elementos de an?lisis gestual en el mismo. Esta investigaci?n invita a indagar m?s sobre el papel de estos dos componentes de la interacci?n (discurso oral y gestualidad) tanto en el aula como en otras situaciones comunicativas
Resumo:
International audience
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.