999 resultados para Visual saliency


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The large and growing number of digital images is making manual image search laborious. Only a fraction of the images contain metadata that can be used to search for a particular type of image. Thus, the main research question of this thesis is whether it is possible to learn visual object categories directly from images. Computers process images as long lists of pixels that do not have a clear connection to high-level semantics which could be used in the image search. There are various methods introduced in the literature to extract low-level image features and also approaches to connect these low-level features with high-level semantics. One of these approaches is called Bag-of-Features which is studied in the thesis. In the Bag-of-Features approach, the images are described using a visual codebook. The codebook is built from the descriptions of the image patches using clustering. The images are described by matching descriptions of image patches with the visual codebook and computing the number of matches for each code. In this thesis, unsupervised visual object categorisation using the Bag-of-Features approach is studied. The goal is to find groups of similar images, e.g., images that contain an object from the same category. The standard Bag-of-Features approach is improved by using spatial information and visual saliency. It was found that the performance of the visual object categorisation can be improved by using spatial information of local features to verify the matches. However, this process is computationally heavy, and thus, the number of images must be limited in the spatial matching, for example, by using the Bag-of-Features method as in this study. Different approaches for saliency detection are studied and a new method based on the Hessian-Affine local feature detector is proposed. The new method achieves comparable results with current state-of-the-art. The visual object categorisation performance was improved by using foreground segmentation based on saliency information, especially when the background could be considered as clutter.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we present a novel coarse-to-fine visual localization approach: contextual visual localization. This approach relies on three elements: (i) a minimal-complexity classifier for performing fast coarse localization (submap classification); (ii) an optimized saliency detector which exploits the visual statistics of the submap; and (iii) a fast view-matching algorithm which filters initial matchings with a structural criterion. The latter algorithm yields fine localization. Our experiments show that these elements have been successfully integrated for solving the global localization problem. Context, that is, the awareness of being in a particular submap, is defined by a supervised classifier tuned for a minimal set of features. Visual context is exploited both for tuning (optimizing) the saliency detection process, and to select potential matching views in the visual database, close enough to the query view.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The arboreal ant Odontomachus hastatus nests among roots of epiphytic bromeliads in the sandy forest at Cardoso Island (Brazil). Crepuscular and nocturnal foragers travel up to 8m to search for arthropod prey in the canopy, where silhouettes of leaves and branches potentially provide directional information. We investigated the relevance of visual cues (canopy, horizon patterns) during navigation in O. hastatus. Laboratory experiments using a captive ant colony and a round foraging arena revealed that an artificial canopy pattern above the ants and horizon visual marks are effective orientation cues for homing O. hastatus. On the other hand, foragers that were only given a tridimensional landmark (cylinder) or chemical marks were unable to home correctly. Navigation by visual cues in O. hastatus is in accordance with other diurnal arboreal ants. Nocturnal luminosity (moon, stars) is apparently sufficient to produce contrasting silhouettes from the canopy and surrounding vegetation, thus providing orientation cues. Contrary to the plain floor of the round arena, chemical cues may be important for marking bifurcated arboreal routes. This experimental demonstration of the use of visual cues by a predominantly nocturnal arboreal ant provides important information for comparative studies on the evolution of spatial orientation behavior in ants. This article is part of a Special Issue entitled: Neotropical Behaviour.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of this cross-sectional observational study was to quantify the pattern-shift visual evoked potentials (VEP) and the thickness as well as the volume of retinal layers using optical coherence tomography (OCT) across a cohort of Parkinson's disease (PD) patients and age-matched controls. Forty-three PD patients and 38 controls were enrolled. All participants underwent a detailed neurological and ophthalmologic evaluation. Idiopathic PD cases were included. Cases with glaucoma or increased intra-ocular pressure were excluded. Patients were assessed by VEP and high-resolution Fourier-domain OCT, which quantified the inner and outer thicknesses of the retinal layers. VEP latencies and the thicknesses of the retinal layers were the main outcome measures. The mean age, with standard deviation (SD), of the PD patients and controls were 63.1 (7.5) and 62.4 (7.2) years, respectively. The patients were predominantly in the initial Hoehn-Yahr (HY) disease stages (34.8% in stage 1 or 1.5, and 55.8 % in stage 2). The VEP latencies and the thicknesses as well as the volumes of the retinal inner and outer layers of the groups were similar. A negative correlation between the retinal thickness and the age was noted in both groups. The thickness of the retinal nerve fibre layer (RNFL) was 102.7 μm in PD patients vs. 104.2 μm in controls. The thicknesses of retinal layers, VEP, and RNFL of PD patients were similar to those of the controls. Despite the use of a representative cohort of PD patients and high-resolution OCT in this study, further studies are required to establish the validity of using OCT and VEP measurements as the anatomic and functional biomarkers for the evaluation of retinal and visual pathways in PD patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The authors conducted a systematic literature review on physical activity interventions for children and youth with visual impairment (VI). Five databases were searched to identify studies involving the population of interest and physical activity practices. After evaluating 2,495 records, the authors found 18 original full-text studies published in English they considered eligible. They identified 8 structured exercise-training studies that yielded overall positive effect on physical-fitness and motor-skill outcomes. Five leisure-time-physical-activity and 5 instructional-strategy interventions were also found with promising proposals to engage and instruct children and youth with VI to lead an active lifestyle. However, the current research on physical activity interventions for children and youth with VI is still limited by an absence of high-quality research designs, low sample sizes, use of nonvalidated outcome measures, and lack of generalizability, which need to be addressed in future studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study is to introduce a method to evaluate visual functions in infants in the first three months of life. An adaptation of the Guide for the Assessment of Visual Ability in Infants (Gagliardo, 1997) was used. The instrument was a ring with string. It was implemented a pilot study with 33 infants, selected according to the following criteria: neonates well enough to go home within two days of birth; 1 to 3 months of chronological age; monthly evaluation with no absence; subjects living in Campinas/SP metropolitan area. In the first month we observed: visual fixation (93,9%); eye contact (90,9%); horizontal tracking (72,7%); inspects surroundings (97,0%). In the third month, we observed: inspects own hands (42,4%) and increased movements of arms (36,4%). This method allowed the evaluation of visual functions in infants, according to the chronological age. Alterations in this function will facilitate immediate referral to medical services for diagnoses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effects of ionic strength on ions in aqueous solutions are quite relevant, especially for biochemical systems, in which proteins and amino acids are involved. The teaching of this topic and more specifically, the Debye-Hückel limiting law, is central in chemistry undergraduate courses. In this work, we present a description of an experimental procedure based on the color change of aqueous solutions of bromocresol green (BCG), driven by addition of electrolyte. The contribution of charge product (z+|z-|) to the Debye-Hückel limiting law is demonstrated when the effects of NaCl and Na2SO4 on the color of BCG solutions are compared.