5 resultados para Image and video acquisition
em DRUM (Digital Repository at the University of Maryland)
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.
Resumo:
Deficits in social communication and interaction have been identified as distinguishing impairments for individuals with an autism spectrum disorder (ASD). As a pivotal skill, the successful development of social communication and interaction in individuals with ASD is a lifelong objective. Point-of-view video modeling has the potential to address these deficits. This type of video involves filming the completion of a targeted skill or behavior from a first-person perspective. By presenting only what a person might see from his or her viewpoint, it has been identified to be more effective in limiting irrelevant stimuli by providing a clear frame of reference to facilitate imitation. The current study investigated the use of point-of-view video modeling in teaching social initiations (e.g., greetings). Using a multiple baseline across participants design, five kindergarten participants were taught social initiations using point-of-view video modeling and video priming. Immediately before and after viewing the entire point-of-view video model, the participants were evaluated on their social initiations with a trained, typically developing peer serving as a communication partner. Specifically, the social initiations involved participants’ abilities to shift their attention toward the peer who entered the classroom, maintain attention toward the peer, and engage in an appropriate social initiation (e.g., hi, hello). Both generalization and maintenance were tested. Overall, the data suggest point-of-view video modeling is an effective intervention for increasing social initiations in young students with ASD. However, retraining was necessary for acquisition of skills in the classroom environment. Generalization in novel environments and with a novel communication partner, and generalization to other social initiation skills was limited. Additionally, maintenance of gained social initiation skills only occurred in the intervention room. Despite the limitations of the study and variable results, there are a number of implications moving forward for both practitioners and future researchers examining point-of-view modeling and its potential impact on the social initiation skills of individuals with ASD.
Resumo:
A computer vision system that has to interact in natural language needs to understand the visual appearance of interactions between objects along with the appearance of objects themselves. Relationships between objects are frequently mentioned in queries of tasks like semantic image retrieval, image captioning, visual question answering and natural language object detection. Hence, it is essential to model context between objects for solving these tasks. In the first part of this thesis, we present a technique for detecting an object mentioned in a natural language query. Specifically, we work with referring expressions which are sentences that identify a particular object instance in an image. In many referring expressions, an object is described in relation to another object using prepositions, comparative adjectives, action verbs etc. Our proposed technique can identify both the referred object and the context object mentioned in such expressions. Context is also useful for incrementally understanding scenes and videos. In the second part of this thesis, we propose techniques for searching for objects in an image and events in a video. Our proposed incremental algorithms use the context from previously explored regions to prioritize the regions to explore next. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. Our first proposed technique shows how to learn context in indoor scenes in an implicit manner and use it for searching for objects. The second technique shows how explicitly written context rules of one-on-one basketball can be used to sequentially detect events in a game.
Resumo:
MOVE is a composition for string quartet, piano, percussion and electronics of approximately 15-16 minutes duration in three movements. The work incorporates electronic samples either synthesized electronically by the composer or recorded from acoustic instruments. The work aims to use electronic sounds as an expansion of the tonal palette of the chamber group (rather like an extended percussion setup) as opposed to a dominating sonic feature of the music. This is done by limiting the use of electronics to specific sections of the work, and by prioritizing blend and sonic coherence in the synthesized samples. The work uses fixed electronics in such a way that allows for tempo variations in the music. Generally, a difficulty arises in that fixed “tape” parts don’t allow tempo variations; while truly “live” software algorithms sacrifice rhythmic accuracy. Sample pads, such as the Roland SPD-SX, provide an elegant solution. The latency of such a device is close enough to zero that individual samples can be triggered in real time at a range of tempi. The percussion setup in this work (vibraphone and sample pad) allows one player to cover both parts, eliminating the need for an external musician to trigger the electronics. Compositionally, momentum is used as a constructing principle. The first movement makes prominent use of ostinato and shifting meter. The second is a set of variations on a repeated harmonic pattern, with a polymetric middle section. The third is a type of passacaglia, wherein the bassline is not introduced right away, but becomes more significant later in the movement. Given the importance of visual presentation in the Internet age, the final goal of the project was to shoot HD video of a studio performance of the work for publication online. The composer recorded audio and video in two separate sessions and edited the production using Logic X and Adobe Premiere Pro. The final video presentation can be seen at geoffsheil.com/move.
Resumo:
This qualitative case study explored three teacher candidates’ learning and enactment of discourse-focused mathematics teaching practices. Using audio and video recordings of their teaching practice this study aimed to identify the shifts in the way in which the teacher candidates enacted the following discourse practices: elicited and used evidence of student thinking, posed purposeful questions, and facilitated meaningful mathematical discourse. The teacher candidates’ written reflections from their practice-based coursework as well as interviews were examined to see how two mathematics methods courses influenced their learning and enactment of the three discourse focused mathematics teaching practices. These data sources were also used to identify tensions the teacher candidates encountered. All three candidates in the study were able to successfully enact and reflect on these discourse-focused mathematics teaching practices at various time points in their preparation programs. Consistency of use and areas of improvement differed, however, depending on various tensions experienced by each candidate. Access to quality curriculum materials as well as time to formulate and enact thoughtful lesson plans that supported classroom discourse were tensions for these teacher candidates. This study shows that teacher candidates are capable of enacting discourse-focused teaching practices early in their field placements and with the support of practice-based coursework they can analyze and reflect on their practice for improvement. This study also reveals the importance of assisting teacher candidates in accessing rich mathematical tasks and collaborating during lesson planning. More research needs to be explored to identify how specific aspects of the learning cycle impact individual teachers and how this can be used to improve practice-based teacher education courses.