826 resultados para Image and video indexing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper focuses on hand biometrics applied to images acquired from a mobile device. The system offers the possibility of identifying individuals based on features extracted from hand pictures obtained with a low-quality camera embedded on a mobile device. Furthermore, the acquisitions have been carried out regardless illumination control, orientation, distance to camera, and similar aspects. In addition, the whole system has been tested with an owned database. Finally, the results obtained (6.0% ± 0.2) and the algorithm structure are both promising in relation to a posterior mobile implementation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta tesis trata sobre métodos de corrección que compensan la variación de las condiciones de iluminación en aplicaciones de imagen y video a color. Estas variaciones hacen que a menudo fallen aquellos algoritmos de visión artificial que utilizan características de color para describir los objetos. Se formulan tres preguntas de investigación que definen el marco de trabajo de esta tesis. La primera cuestión aborda las similitudes que se dan entre las imágenes de superficies adyacentes en relación a su comportamiento fotométrico. En base al análisis del modelo de formación de imágenes en situaciones dinámicas, esta tesis propone un modelo capaz de predecir las variaciones de color de la región de una determinada imagen a partir de las variaciones de las regiones colindantes. Dicho modelo se denomina Quotient Relational Model of Regions. Este modelo es válido cuando: las fuentes de luz iluminan todas las superficies incluídas en él; estas superficies están próximas entre sí y tienen orientaciones similares; y cuando son en su mayoría lambertianas. Bajo ciertas circunstancias, la respuesta fotométrica de una región se puede relacionar con el resto mediante una combinación lineal. No se ha podido encontrar en la literatura científica ningún trabajo previo que proponga este tipo de modelo relacional. La segunda cuestión va un paso más allá y se pregunta si estas similitudes se pueden utilizar para corregir variaciones fotométricas desconocidas en una región también desconocida, a partir de regiones conocidas adyacentes. Para ello, se propone un método llamado Linear Correction Mapping capaz de dar una respuesta afirmativa a esta cuestión bajo las circunstancias caracterizadas previamente. Para calcular los parámetros del modelo se requiere una etapa de entrenamiento previo. El método, que inicialmente funciona para una sola cámara, se amplía para funcionar en arquitecturas con varias cámaras sin solape entre sus campos visuales. Para ello, tan solo se necesitan varias muestras de imágenes del mismo objeto capturadas por todas las cámaras. Además, este método tiene en cuenta tanto las variaciones de iluminación, como los cambios en los parámetros de exposición de las cámaras. Todos los métodos de corrección de imagen fallan cuando la imagen del objeto que tiene que ser corregido está sobreexpuesta o cuando su relación señal a ruido es muy baja. Así, la tercera cuestión se refiere a si se puede establecer un proceso de control de la adquisición que permita obtener una exposición óptima cuando las condiciones de iluminación no están controladas. De este modo, se propone un método denominado Camera Exposure Control capaz de mantener una exposición adecuada siempre y cuando las variaciones de iluminación puedan recogerse dentro del margen dinámico de la cámara. Los métodos propuestos se evaluaron individualmente. La metodología llevada a cabo en los experimentos consistió en, primero, seleccionar algunos escenarios que cubrieran situaciones representativas donde los métodos fueran válidos teóricamente. El Linear Correction Mapping fue validado en tres aplicaciones de re-identificación de objetos (vehículos, caras y personas) que utilizaban como caracterísiticas la distribución de color de éstos. Por otra parte, el Camera Exposure Control se probó en un parking al aire libre. Además de esto, se definieron varios indicadores que permitieron comparar objetivamente los resultados de los métodos propuestos con otros métodos relevantes de corrección y auto exposición referidos en el estado del arte. Los resultados de la evaluación demostraron que los métodos propuestos mejoran los métodos comparados en la mayoría de las situaciones. Basándose en los resultados obtenidos, se puede decir que las respuestas a las preguntas de investigación planteadas son afirmativas, aunque en circunstancias limitadas. Esto quiere decir que, las hipótesis planteadas respecto a la predicción, la corrección basada en ésta y la auto exposición, son factibles en aquellas situaciones identificadas a lo largo de la tesis pero que, sin embargo, no se puede garantizar que se cumplan de manera general. Por otra parte, se señalan como trabajo de investigación futuro algunas cuestiones nuevas y retos científicos que aparecen a partir del trabajo presentado en esta tesis. ABSTRACT This thesis discusses the correction methods used to compensate the variation of lighting conditions in colour image and video applications. These variations are such that Computer Vision algorithms that use colour features to describe objects mostly fail. Three research questions are formulated that define the framework of the thesis. The first question addresses the similarities of the photometric behaviour between images of dissimilar adjacent surfaces. Based on the analysis of the image formation model in dynamic situations, this thesis proposes a model that predicts the colour variations of the region of an image from the variations of the surrounded regions. This proposed model is called the Quotient Relational Model of Regions. This model is valid when the light sources illuminate all of the surfaces included in the model; these surfaces are placed close each other, have similar orientations, and are primarily Lambertian. Under certain circumstances, a linear combination is established between the photometric responses of the regions. Previous work that proposed such a relational model was not found in the scientific literature. The second question examines whether those similarities could be used to correct the unknown photometric variations in an unknown region from the known adjacent regions. A method is proposed, called Linear Correction Mapping, which is capable of providing an affirmative answer under the circumstances previously characterised. A training stage is required to determine the parameters of the model. The method for single camera scenarios is extended to cover non-overlapping multi-camera architectures. To this extent, only several image samples of the same object acquired by all of the cameras are required. Furthermore, both the light variations and the changes in the camera exposure settings are covered by correction mapping. Every image correction method is unsuccessful when the image of the object to be corrected is overexposed or the signal-to-noise ratio is very low. Thus, the third question refers to the control of the acquisition process to obtain an optimal exposure in uncontrolled light conditions. A Camera Exposure Control method is proposed that is capable of holding a suitable exposure provided that the light variations can be collected within the dynamic range of the camera. Each one of the proposed methods was evaluated individually. The methodology of the experiments consisted of first selecting some scenarios that cover the representative situations for which the methods are theoretically valid. Linear Correction Mapping was validated using three object re-identification applications (vehicles, faces and persons) based on the object colour distributions. Camera Exposure Control was proved in an outdoor parking scenario. In addition, several performance indicators were defined to objectively compare the results with other relevant state of the art correction and auto-exposure methods. The results of the evaluation demonstrated that the proposed methods outperform the compared ones in the most situations. Based on the obtained results, the answers to the above-described research questions are affirmative in limited circumstances, that is, the hypothesis of the forecasting, the correction based on it, and the auto exposure are feasible in the situations identified in the thesis, although they cannot be guaranteed in general. Furthermore, the presented work raises new questions and scientific challenges, which are highlighted as future research work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O presente estudo denominado “Inclusão, Direitos Humanos e Igualdade: Educar para a diferença” direcionou-se para a temática da deficiência física evidente. Partindo do mestrado em Ação Humanitária, Cooperação e Desenvolvimento, incidimos o nosso estudo na temática dos direitos humanos associada à da inclusão. A diferença, usualmente, origina exclusão em virtude de que o que é diferente não é socialmente aceite. Vivemos numa sociedade formatada para o “normal” em que o “normal” apresenta-se sempre como um modo de supressão gradual da diferença e da uniformização da diversidade e onde dificilmente encaixa a “diferença”. Uma das áreas onde se verifica tratamento diferenciado é a nível da deficiência que, não sendo entendida pela sociedade, gera um ciclo vicioso que dificilmente se quebra. Para tal, a escola terá de desempenhar um papel de extrema importância, modificando mentalidades, promovendo a deficiência, para que no futuro esta não seja encarada como uma diferença, mas como uma mais-valia no processo de singularidade e de diversidade humanas. Os comportamentos e os movimentos de transformação não podem ser impostos, mas devem ser introduzidos, compreendidos e modificados, pois só desta forma podem servir como alavanca de suporte para uma sociedade mais justa, mais inclusiva, mais humana, pois está nas mãos de cada um de nós, educadores, formar futuros cidadãos conscientes, ativos e responsáveis. A sociedade atual vive momentos conturbados decorrentes de interesses geopolíticos e estratégicos que potenciam conflitos armados, em que cada vez mais a população é indiscriminadamente afetada. Uma das consequências destes conflitos é a deficiência física evidente, aqui explorada, e que é uma realidade inerente à nossa prática profissional. Todos os dias são colocados em questão os direitos humanos de quem tem que conviver diariamente com ambientes bélicos. Apesar da sociedade portuguesa estar pouco desperta para esta realidade, pareceu-nos pertinente levar esta temática para a escola tentando explorar as perceções de crianças face à deficiência física evidente, quando observada noutras crianças, em ambiente escolar. Fizemos a aplicação de diversos métodos (inquéritos, visualização de imagens e vídeo, atividades de simulação de deficiência), atividades práticas em ambiente escolar que nos permitissem atingir o nosso objetivo geral. Posteriormente, foi realizada a interpretação de dados decorrentes da aplicação da metodologia, quer quantitativamente, quer qualitativamente efetuada a sua análise e retiradas conclusões.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Video coding technologies have played a major role in the explosion of large market digital video applications and services. In this context, the very popular MPEG-x and H-26x video coding standards adopted a predictive coding paradigm, where complex encoders exploit the data redundancy and irrelevancy to 'control' much simpler decoders. This codec paradigm fits well applications and services such as digital television and video storage where the decoder complexity is critical, but does not match well the requirements of emerging applications such as visual sensor networks where the encoder complexity is more critical. The Slepian Wolf and Wyner-Ziv theorems brought the possibility to develop the so-called Wyner-Ziv video codecs, following a different coding paradigm where it is the task of the decoder, and not anymore of the encoder, to (fully or partly) exploit the video redundancy. Theoretically, Wyner-Ziv video coding does not incur in any compression performance penalty regarding the more traditional predictive coding paradigm (at least for certain conditions). In the context of Wyner-Ziv video codecs, the so-called side information, which is a decoder estimate of the original frame to code, plays a critical role in the overall compression performance. For this reason, much research effort has been invested in the past decade to develop increasingly more efficient side information creation methods. This paper has the main objective to review and evaluate the available side information methods after proposing a classification taxonomy to guide this review, allowing to achieve more solid conclusions and better identify the next relevant research challenges. After classifying the side information creation methods into four classes, notably guess, try, hint and learn, the review of the most important techniques in each class and the evaluation of some of them leads to the important conclusion that the side information creation methods provide better rate-distortion (RD) performance depending on the amount of temporal correlation in each video sequence. It became also clear that the best available Wyner-Ziv video coding solutions are almost systematically based on the learn approach. The best solutions are already able to systematically outperform the H.264/AVC Intra, and also the H.264/AVC zero-motion standard solutions for specific types of content. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel framework for multimodal semantic-associative collateral image labelling, aiming at associating image regions with textual keywords, is described. Both the primary image and collateral textual modalities are exploited in a cooperative and complementary fashion. The collateral content and context based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix, of the visual keywords, A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. Finally, we use Self Organising Maps to examine the classification and retrieval effectiveness of the proposed high-level image feature vector model which is constructed based on the image labelling results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Individuals with schizophrenia, particularly those with passivity symptoms, may not feel in control of their actions, believing them to be controlled by external agents. Cognitive operations that contribute to these symptoms may include abnormal processing in agency as well as body representations that deal with body schema and body image. However, these operations in schizophrenia are not fully understood, and the questions of general versus specific deficits in individuals with different symptom profiles remain unanswered. Using the projected-hand illusion (a digital video version of the rubber-hand illusion) with synchronous and asynchronous stroking (500 ms delay), and a hand laterality judgment task, we assessed sense of agency, body image, and body schema in 53 people with clinically stable schizophrenia (with a current, past, and no history of passivity symptoms) and 48 healthy controls. The results revealed a stable trait in schizophrenia with no difference between clinical subgroups (sense of agency) and some quantitative (specific) differences depending on the passivity symptom profile (body image and body schema). Specifically, a reduced sense of self-agency was a common feature of all clinical subgroups. However, subgroup comparisons showed that individuals with passivity symptoms (both current and past) had significantly greater deficits on tasks assessing body image and body schema, relative to the other groups. In addition, patients with current passivity symptoms failed to demonstrate the normal reduction in body illusion typically seen with a 500 ms delay in visual feedback (asynchronous condition), suggesting internal timing problems. Altogether, the results underscore self-abnormalities in schizophrenia, provide evidence for both trait abnormalities and state changes specific to passivity symptoms, and point to a role for internal timing deficits as a mechanistic explanation for external cues becoming a possible source of self-body input.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This contribution discusses the effects of camera aperture correction in broadcast video on colour-based keying. The aperture correction is used to ’sharpen’ an image and is one element that distinguishes the ’TV-look’ from ’film-look’. ’If a very high level of sharpening is applied, as is the case in many TV productions then this significantly shifts the colours around object boundaries with hight contrast. This paper discusses these effects and their impact on keying and describes a simple low-pass filter to compensate for them. Tests with colour-based segmentation algorithms show that the proposed compensation is an effective way of decreasing the keying artefacts on object boundaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a user assisted technique for 3D stereo conversion from 2D images. Our approach exploits the geometric structure of perspective images including vanishing points. We allow a user to indicate lines, planes, and vanishing points in the input image, and directly employ these as constraints in an image warping framework to produce a stereo pair. By sidestepping explicit construction of a depth map, our approach is applicable to more general scenes and avoids potential artifacts of depth-image-based rendering. Our method is most suitable for scenes with large scale structures such as buildings.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study compares interpreter-mediated face-to-face Magistrates Court hearings with those conducted through prison video link in which interpreters are located in court and non- English-speaking defendants in prison. It seeks to examine the impact that the presence of video link has on court actors in terms of interaction and behaviour. The data comprises 11 audio-recordings of face-to-face hearings, 10 recordings of prison video link hearings, semistructured interviews with 27 court actors, and ethnographic observation of hearings as viewed by defendants in Wormwood Scrubs prison in London. The over-arching theme is the pervasive influence of the ecology of the courtroom upon all court actors in interpretermediated hearings and thus on the communication process. Close analysis of the court transcripts shows that their relative proximity to one another can be a determinant of status, interpreting role, mode and volume. The very few legal protocols which apply to interpretermediated cases (acknowledging and ratifying the interpreter, for example), are often forgotten or dispensed with. Court interpreters lack proper training in the specific challenges of court interpreting, whether they are co-present with the defendant or not. Other court actors often misunderstand the interpreter’s role. This has probably come about because courts have adjusted their perceptions of what they think interpreters are supposed to do based on their own experiences of working with them, and have gradually come to accept poor practice (the inability to perform simultaneous interpreting, for example) as the norm. In video link courts, mismatches of sound and image due to court clerks’ failure to adequately track current speakers, poor image and sound quality and the fact that non-English-speaking defendants in pre-and post-court consultations can see and hear interpreters but not their defence advocates are just some of the additional layers of disadvantage and confusion already suffered by non- English-speaking defendants. These factors make it less likely that justice will be done.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.