841 resultados para visual object detection
Resumo:
This thesis deals with Visual Servoing and its strictly connected disciplines like projective geometry, image processing, robotics and non-linear control. More specifically the work addresses the problem to control a robotic manipulator through one of the largely used Visual Servoing techniques: the Image Based Visual Servoing (IBVS). In Image Based Visual Servoing the robot is driven by on-line performing a feedback control loop that is closed directly in the 2D space of the camera sensor. The work considers the case of a monocular system with the only camera mounted on the robot end effector (eye in hand configuration). Through IBVS the system can be positioned with respect to a 3D fixed target by minimizing the differences between its initial view and its goal view, corresponding respectively to the initial and the goal system configurations: the robot Cartesian Motion is thus generated only by means of visual informations. However, the execution of a positioning control task by IBVS is not straightforward because singularity problems may occur and local minima may be reached where the reached image is very close to the target one but the 3D positioning task is far from being fulfilled: this happens in particular for large camera displacements, when the the initial and the goal target views are noticeably different. To overcame singularity and local minima drawbacks, maintaining the good properties of IBVS robustness with respect to modeling and camera calibration errors, an opportune image path planning can be exploited. This work deals with the problem of generating opportune image plane trajectories for tracked points of the servoing control scheme (a trajectory is made of a path plus a time law). The generated image plane paths must be feasible i.e. they must be compliant with rigid body motion of the camera with respect to the object so as to avoid image jacobian singularities and local minima problems. In addition, the image planned trajectories must generate camera velocity screws which are smooth and within the allowed bounds of the robot. We will show that a scaled 3D motion planning algorithm can be devised in order to generate feasible image plane trajectories. Since the paths in the image are off-line generated it is also possible to tune the planning parameters so as to maintain the target inside the camera field of view even if, in some unfortunate cases, the feature target points would leave the camera images due to 3D robot motions. To test the validity of the proposed approach some both experiments and simulations results have been reported taking also into account the influence of noise in the path planning strategy. The experiments have been realized with a 6DOF anthropomorphic manipulator with a fire-wire camera installed on its end effector: the results demonstrate the good performances and the feasibility of the proposed approach.
Resumo:
[EN]Perceptual User Interfaces (PUIs) aim at facilitating human-computer interaction with the aid of human-like capacities (computer vision, speech recognition, etc.). In PUIs, the human face is a central element, since it conveys not only identity but also other important information, particularly with respect to the user’s mood or emotional state. This paper describes both a face detector and a smile detector for PUIs. Both are suitable for real-time interaction.
Resumo:
A single picture provides a largely incomplete representation of the scene one is looking at. Usually it reproduces only a limited spatial portion of the scene according to the standpoint and the viewing angle, besides it contains only instantaneous information. Thus very little can be understood on the geometrical structure of the scene, the position and orientation of the observer with respect to it remaining also hard to guess. When multiple views, taken from different positions in space and time, observe the same scene, then a much deeper knowledge is potentially achievable. Understanding inter-views relations enables construction of a collective representation by fusing the information contained in every single image. Visual reconstruction methods confront with the formidable, and still unanswered, challenge of delivering a comprehensive representation of structure, motion and appearance of a scene from visual information. Multi-view visual reconstruction deals with the inference of relations among multiple views and the exploitation of revealed connections to attain the best possible representation. This thesis investigates novel methods and applications in the field of visual reconstruction from multiple views. Three main threads of research have been pursued: dense geometric reconstruction, camera pose reconstruction, sparse geometric reconstruction of deformable surfaces. Dense geometric reconstruction aims at delivering the appearance of a scene at every single point. The construction of a large panoramic image from a set of traditional pictures has been extensively studied in the context of image mosaicing techniques. An original algorithm for sequential registration suitable for real-time applications has been conceived. The integration of the algorithm into a visual surveillance system has lead to robust and efficient motion detection with Pan-Tilt-Zoom cameras. Moreover, an evaluation methodology for quantitatively assessing and comparing image mosaicing algorithms has been devised and made available to the community. Camera pose reconstruction deals with the recovery of the camera trajectory across an image sequence. A novel mosaic-based pose reconstruction algorithm has been conceived that exploit image-mosaics and traditional pose estimation algorithms to deliver more accurate estimates. An innovative markerless vision-based human-machine interface has also been proposed, so as to allow a user to interact with a gaming applications by moving a hand held consumer grade camera in unstructured environments. Finally, sparse geometric reconstruction refers to the computation of the coarse geometry of an object at few preset points. In this thesis, an innovative shape reconstruction algorithm for deformable objects has been designed. A cooperation with the Solar Impulse project allowed to deploy the algorithm in a very challenging real-world scenario, i.e. the accurate measurements of airplane wings deformations.
Resumo:
Die Detektion von Bewegung stellt eine der fundamentalsten Fähigkeiten der visuellen Wahrnehmung dar. Um zu klären, ob das System zur Bewegungswahrnehmung Eingang nur durch einen Zapfentyp erhält, oder ob eine Kombination von verschiedenen Zapfentypen vorliegt, wurde eine rotierende zwei-armige archimedische Spiralscheibe verwendet (reale Bewegung), bei der sich Spirale und Hintergrund farblich unterschieden. Durch Veränderung der Intensität farbiger Leuchtstoffröhren konnte eine Beleuchtungssituation geschaffen werden, bei der die (radiale) Bewegung der Spirale nicht mehr wahrgenommen werden konnte, obwohl Spirale und Hintergrund farblich verschieden waren. Die Bestimmung der Zapfenerregungen im 3-D Rezeptorraum ließ einen Beitrag sowohl des L– als auch des M-Zapfens bei normalsichtigen Trichromaten (dominiert durch L), jedoch einen alleinigen Beitrag des M-Zapfens bei Protanopen erkennen. Die Ermittlung der spektralen Empfindlichkeit basierend auf einer Vektor Analyse im 3D-Rezeptorraum zeigte schließlich, dass dem neuronalen Bewegungsdetektor ein additiver Beitrag des L- und M-Zapfens, in Übereinstimmung mit der Hellempfindlichkeitsfunktion (Vλ), zugrunde liegt. Als Ergebnis schreiben wir die Detektion von Objektbewegung einem farbenblinden Mechanismus zu. Es ist sehr wahrscheinlich, dass der Magnozelluläre-Kanal das neuronale Substrat dieses Bewegungsdetektors repräsentiert.
Resumo:
Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.
Resumo:
Introduction and aims of the research Nitric oxide (NO) and endocannabinoids (eCBs) are major retrograde messengers, involved in synaptic plasticity (long-term potentiation, LTP, and long-term depression, LTD) in many brain areas (including hippocampus and neocortex), as well as in learning and memory processes. NO is synthesized by NO synthase (NOS) in response to increased cytosolic Ca2+ and mainly exerts its functions through soluble guanylate cyclase (sGC) and cGMP production. The main target of cGMP is the cGMP-dependent protein kinase (PKG). Activity-dependent release of eCBs in the CNS leads to the activation of the Gαi/o-coupled cannabinoid receptor 1 (CB1) at both glutamatergic and inhibitory synapses. The perirhinal cortex (Prh) is a multimodal associative cortex of the temporal lobe, critically involved in visual recognition memory. LTD is proposed to be the cellular correlate underlying this form of memory. Cholinergic neurotransmission has been shown to play a critical role in both visual recognition memory and LTD in Prh. Moreover, visual recognition memory is one of the main cognitive functions impaired in the early stages of Alzheimer’s disease. The main aim of my research was to investigate the role of NO and ECBs in synaptic plasticity in rat Prh and in visual recognition memory. Part of this research was dedicated to the study of synaptic transmission and plasticity in a murine model (Tg2576) of Alzheimer’s disease. Methods Field potential recordings. Extracellular field potential recordings were carried out in horizontal Prh slices from Sprague-Dawley or Dark Agouti juvenile (p21-35) rats. LTD was induced with a single train of 3000 pulses delivered at 5 Hz (10 min), or via bath application of carbachol (Cch; 50 μM) for 10 min. LTP was induced by theta-burst stimulation (TBS). In addition, input/output curves and 5Hz-LTD were carried out in Prh slices from 3 month-old Tg2576 mice and littermate controls. Behavioural experiments. The spontaneous novel object exploration task was performed in intra-Prh bilaterally cannulated adult Dark Agouti rats. Drugs or vehicle (saline) were directly infused into the Prh 15 min before training to verify the role of nNOS and CB1 in visual recognition memory acquisition. Object recognition memory was tested at 20 min and 24h after the end of the training phase. Results Electrophysiological experiments in Prh slices from juvenile rats showed that 5Hz-LTD is due to the activation of the NOS/sGC/PKG pathway, whereas Cch-LTD relies on NOS/sGC but not PKG activation. By contrast, NO does not appear to be involved in LTP in this preparation. Furthermore, I found that eCBs are involved in LTP induction, but not in basal synaptic transmission, 5Hz-LTD and Cch-LTD. Behavioural experiments demonstrated that the blockade of nNOS impairs rat visual recognition memory tested at 24 hours, but not at 20 min; however, the blockade of CB1 did not affect visual recognition memory acquisition tested at both time points specified. In three month-old Tg2576 mice, deficits in basal synaptic transmission and 5Hz-LTD were observed compared to littermate controls. Conclusions The results obtained in Prh slices from juvenile rats indicate that NO and CB1 play a role in the induction of LTD and LTP, respectively. These results are confirmed by the observation that nNOS, but not CB1, is involved in visual recognition memory acquisition. The preliminary results obtained in the murine model of Alzheimer’s disease indicate that deficits in synaptic transmission and plasticity occur very early in Prh; further investigations are required to characterize the molecular mechanisms underlying these deficits.
Resumo:
The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.
Resumo:
During this thesis a new telemetric recording system has been developed allowing ECoG/EEG recordings in freely behaving rodents (Lapray et al., 2008; Lapray et al., in press). This unit has been shown to not generate any discomfort in the implanted animals and to allow recordings in a wide range of environments. In the second part of this work the developed technique has been used to investigate what cortical activity was related to the process of novelty detection in rats’ barrel cortex. We showed that the detection of a novel object is accompanied in the barrel cortex by a transient burst of activity in the γ frequency range (40-47 Hz) around 200 ms after the whiskers contact with the object (Lapray et al., accepted). This activity was associated to a decrease in the lower range of γ frequencies (30-37 Hz). This network activity may represent the optimal oscillatory pattern for the propagation and storage of new information in memory related structures. The frequency as well as the timing of appearance correspond well with other studies concerning novelty detection related burst of activity in other sensory systems (Barcelo et al., 2006; Haenschel et al., 2000; Ranganath & Rainer, 2003). Here, the burst of activity is well suited to induce plastic and long-lasting modifications in neuronal circuits (Harris et al., 2003). The debate is still open whether synchronised activity in the brain is a part of information processing or an epiphenomenon (Shadlen & Movshon, 1999; Singer, 1999). The present work provides further evidence that neuronal network activity in the γ frequency range plays an important role in the neocortical processing of sensory stimuli and in higher cognitive functions.
Resumo:
Lesions to the primary geniculo-striate visual pathway cause blindness in the contralesional visual field. Nevertheless, previous studies have suggested that patients with visual field defects may still be able to implicitly process the affective valence of unseen emotional stimuli (affective blindsight) through alternative visual pathways bypassing the striate cortex. These alternative pathways may also allow exploitation of multisensory (audio-visual) integration mechanisms, such that auditory stimulation can enhance visual detection of stimuli which would otherwise be undetected when presented alone (crossmodal blindsight). The present dissertation investigated implicit emotional processing and multisensory integration when conscious visual processing is prevented by real or virtual lesions to the geniculo-striate pathway, in order to further clarify both the nature of these residual processes and the functional aspects of the underlying neural pathways. The present experimental evidence demonstrates that alternative subcortical visual pathways allow implicit processing of the emotional content of facial expressions in the absence of cortical processing. However, this residual ability is limited to fearful expressions. This finding suggests the existence of a subcortical system specialised in detecting danger signals based on coarse visual cues, therefore allowing the early recruitment of flight-or-fight behavioural responses even before conscious and detailed recognition of potential threats can take place. Moreover, the present dissertation extends the knowledge about crossmodal blindsight phenomena by showing that, unlike with visual detection, sound cannot crossmodally enhance visual orientation discrimination in the absence of functional striate cortex. This finding demonstrates, on the one hand, that the striate cortex plays a causative role in crossmodally enhancing visual orientation sensitivity and, on the other hand, that subcortical visual pathways bypassing the striate cortex, despite affording audio-visual integration processes leading to the improvement of simple visual abilities such as detection, cannot mediate multisensory enhancement of more complex visual functions, such as orientation discrimination.
Resumo:
We usually perform actions in a dynamic environment and changes in the location of a target for an upcoming action require both covert shifts of attention and motor planning update. In this study we tested whether, similarly to oculomotor areas that provide signals for overt and covert attention shifts, covert attention shifts modulate activity in cortical area V6A, which provides a bridge between visual signals and arm-motor control. We performed single cell recordings in monkeys trained to fixate straight-ahead while shifting attention outward to a peripheral cue and inward again to the fixation point. We found that neurons in V6A are influenced by spatial attention demonstrating that visual, motor, and attentional responses can occur in combination in single neurons of V6A. This modulation in an area primarily involved in visuo-motor transformation for reaching suggests that also reach-related regions could directly contribute in the shifts of spatial attention necessary to plan and control goal-directed arm movements. Moreover, to test whether V6A is causally involved in these processes, we have performed a human study using on-line repetitive transcranial magnetic stimulation over the putative human V6A (pV6A) during an attention and a reaching task requiring covert shifts of attention and reaching movements towards cued targets in space. We demonstrate that the pV6A is causally involved in attention reorienting to target detection and that this process interferes with the execution of reaching movements towards unattended targets. The current findings suggest the direct involvement of the action-related dorso-medial visual stream in attentional processes, and a more specific role of V6A in attention reorienting. Therefore, we propose that attention signals are used by the V6A to rapidly update the current motor plan or the ongoing action when a behaviorally relevant object unexpectedly appears at an unattended location.
Resumo:
Generic object recognition is an important function of the human visual system and everybody finds it highly useful in their everyday life. For an artificial vision system it is a really hard, complex and challenging task because instances of the same object category can generate very different images, depending of different variables such as illumination conditions, the pose of an object, the viewpoint of the camera, partial occlusions, and unrelated background clutter. The purpose of this thesis is to develop a system that is able to classify objects in 2D images based on the context, and identify to which category the object belongs to. Given an image, the system can classify it and decide the correct categorie of the object. Furthermore the objective of this thesis is also to test the performance and the precision of different supervised Machine Learning algorithms in this specific task of object image categorization. Through different experiments the implemented application reveals good categorization performances despite the difficulty of the problem. However this project is open to future improvement; it is possible to implement new algorithms that has not been invented yet or using other techniques to extract features to make the system more reliable. This application can be installed inside an embedded system and after trained (performed outside the system), so it can become able to classify objects in a real-time. The information given from a 3D stereocamera, developed inside the department of Computer Engineering of the University of Bologna, can be used to improve the accuracy of the classification task. The idea is to segment a single object in a scene using the depth given from a stereocamera and in this way make the classification more accurate.
Resumo:
The research project object of this thesis is focused on the development of an advanced analytical system based on the combination of an improved thin layer chromatography (TLC) plate coupled with infrared (FTIR) and Raman microscopies for the detection of synthetic dyes. Indeed, the characterization of organic colorants, which are commonly present in mixtures with other components and in a very limited amount, still represents a challenging task in scientific analyses of cultural heritage materials. The approach provides selective spectral fingerprints for each compound, foreseeing the complementary information obtained by micro ATR-RAIRS-FTIR and SERS-Raman analyses, which can be performed on the same separated spot. In particular, silver iodide (AgI) applied on a gold coated slide is proposed as an efficient stationary phase for the discrimination of complex analyte mixtures, such as dyes present in samples of art-historical interest. The gold-AgI-TLC plate shows high performances related both to the chromatographic separation of analytes and to the spectroscopic detection of components. The use of a mid-IR transparent inorganic salt as the stationary phase avoids interferences of the background absorption in FTIR investigations. Moreover, by ATR microscopy measurements performed on the gold-AgI surface, a considerable enhancement in the intensity of spectra is observed. Complementary information can be obtained by Raman analyses, foreseeing a SERS activity of the AgI substrate. The method has been tested for the characterization of a mixture of three synthetic organic colorants widely used in dyeing processes: Brilliant Green (BG1), Rhodamine B (BV10) and Methylene Blue (BB9).
Resumo:
This study evaluated the performance of the DIAGNOdent pen laser fluorescence device (LFpen) in comparison with visual examination (VE), bitewing radiographs (BW) and visual examination combined with bitewing radiographs (VEBW) in detecting secondary approximal caries associated with composite restorations. In total, 60 approximal surfaces from 43 permanent molars with composite restorations were assessed twice by two examiners using the LFpen, VE, BW and VEBW. After histological preparation and hardness measurements, the sample was assigned to either a crown or root caries group, depending on the location of the lesions as the gold standard. For crown caries at D1, the highest values of specificity and sensitivity were observed for the LFpen at a cutoff value of 18 (1.00) and for the VEBW (0.89). At D3 (cutoff of 30), the LFpen showed the highest values of sensitivity and specificity. For root caries, the LFpen and VEBW showed the highest values of specificity (0.54), sensitivity (0.81) and accuracy (0.69). The Spearman rank correlation coefficients for crown/root caries with histology were 0.54/0.37 (LFpen), 0.29/0.10 (BW), 0.29/0.18 (VE) and 0.23/0.37 (VEBW). For the LFpen, the ICC varied from 0.80 (interexaminer) to 0.97 (intraexaminer B); the kappa value was 0.19 for BW and 0.35 for VE (interexaminer). Intraexaminer kappa values for BW were 0.25 (A) and 0.29 (B), and those for VE were 0.31 (A) and 0.32 (B). The LFpen device exhibited a performance comparable to that of conventional methods but with higher interexaminer reproducibility. Therefore, the LFpen should be considered an auxiliary method for the detection of secondary approximal caries associated with composite restorations.
Resumo:
Dentinal cracks are occasionally observed at the cut root face after root-end resection in apical surgery. The objective of this ex vivo study was to evaluate and compare the efficiency of visual aids to identify root-end dentinal cracks.
Resumo:
Visual imagery – similar to visual perception – activates feature-specific and category-specific visual areas. This is frequently observed in experiments where the instruction is to imagine stimuli that have been shown immediately before the imagery task. Hence, feature-specific activation could be related to the short-term memory retrieval of previously presented sensory information. Here, we investigated mental imagery of stimuli that subjects had not seen before, eliminating the effects of short-term memory. We recorded brain activation using fMRI while subjects performed a behaviourally controlled guided imagery task in predefined retinotopic coordinates to optimize sensitivity in early visual areas. Whole brain analyses revealed activation in a parieto-frontal network and lateral–occipital cortex. Region of interest (ROI) based analyses showed activation in left hMT/V5+. Granger causality mapping taking left hMT/V5+ as source revealed an imagery-specific directed influence from the left inferior parietal lobule (IPL). Interestingly, we observed a negative BOLD response in V1–3 during imagery, modulated by the retinotopic location of the imagined motion trace. Our results indicate that rule-based motion imagery can activate higher-order visual areas involved in motion perception, with a role for top-down directed influences originating in IPL. Lower-order visual areas (V1, V2 and V3) were down-regulated during this type of imagery, possibly reflecting inhibition to avoid visual input from interfering with the imagery construction. This suggests that the activation in early visual areas observed in previous studies might be related to short- or long-term memory retrieval of specific sensory experiences.