907 resultados para audio-visual information
Resumo:
Robotic platforms have advanced greatly in terms of their remote sensing capabilities, including obtaining optical information using cameras. Alongside these advances, visual mapping has become a very active research area, which facilitates the mapping of areas inaccessible to humans. This requires the efficient processing of data to increase the final mosaic quality and computational efficiency. In this paper, we propose an efficient image mosaicing algorithm for large area visual mapping in underwater environments using multiple underwater robots. Our method identifies overlapping image pairs in the trajectories carried out by the different robots during the topology estimation process, being this a cornerstone for efficiently mapping large areas of the seafloor. We present comparative results based on challenging real underwater datasets, which simulated multi-robot mapping
Resumo:
The present thesis investigated the importance of semantics in generating inferences during discourse processing. Three aspects of semantics, gender stereotypes, implicit causality information and proto-role properties, were used to investigate whether semantics is activated elaboratively during discourse comprehension and what its relative importance is in backward inferencing compared to discourse/structural cues. Visual world eye-tracking studies revealed that semantics plays an important role in both backward and forward inferencing: Gender stereotypes and implicit causality information is activated elaboratively during online discourse comprehension. Moreover, gender stereotypes, implicit causality and proto-role properties of verbs are all used in backward inferencing. Importantly, the studies demonstrated that semantic cues are weighed against discourse/structural cues. When the structural cues consist of a combination of cues that have been independently shown to be important in backward inferencing, semantic effects may be masked, whereas when the structural cues consist of a combination of fewer prominent cues, semantics can have an earlier effect than structural factors in pronoun resolution. In addition, the type of inference matters, too: During anaphoric inferencing semantics has a prominent role, while discourse/structural salience attains more prominence during non-anaphoric inferencing. Finally, semantics exhibits a strong role in inviting new inferences to revise earlier made inferences even in the case the additional inference is not needed to establish coherence in discourse. The findings are generally in line with the Mental Model approaches. Two extended model versions are presented that incorporate the current findings into the earlier literature. These models allow both forward and backward inferencing to occur at any given moment during the course of processing; they also allow semantic and discourse/structural cues to contribute to both of these processes. However, while the Mental Model 1 does not assume interactions between semantic and discourse/structural factors in forward inferencing, the Mental Model 2 does assume such a link.
Resumo:
The large and growing number of digital images is making manual image search laborious. Only a fraction of the images contain metadata that can be used to search for a particular type of image. Thus, the main research question of this thesis is whether it is possible to learn visual object categories directly from images. Computers process images as long lists of pixels that do not have a clear connection to high-level semantics which could be used in the image search. There are various methods introduced in the literature to extract low-level image features and also approaches to connect these low-level features with high-level semantics. One of these approaches is called Bag-of-Features which is studied in the thesis. In the Bag-of-Features approach, the images are described using a visual codebook. The codebook is built from the descriptions of the image patches using clustering. The images are described by matching descriptions of image patches with the visual codebook and computing the number of matches for each code. In this thesis, unsupervised visual object categorisation using the Bag-of-Features approach is studied. The goal is to find groups of similar images, e.g., images that contain an object from the same category. The standard Bag-of-Features approach is improved by using spatial information and visual saliency. It was found that the performance of the visual object categorisation can be improved by using spatial information of local features to verify the matches. However, this process is computationally heavy, and thus, the number of images must be limited in the spatial matching, for example, by using the Bag-of-Features method as in this study. Different approaches for saliency detection are studied and a new method based on the Hessian-Affine local feature detector is proposed. The new method achieves comparable results with current state-of-the-art. The visual object categorisation performance was improved by using foreground segmentation based on saliency information, especially when the background could be considered as clutter.
Resumo:
The growing spread of small but powerful mobile devices (such as PDAs, mobile phone, Internet Tablet, etc.) opens up new scenarios in which users can interact with such devices in many environments in order to access the information at different locations. In this thesis, a ubiquitous computing based system called Secure Bluetooth Audio Transmission System is introduced. This system is situated in a large public place (like airport, festival venues, etc.), where voice messages are conveyed from the system to users' Bluetooth headsets in order to inform users the latest flight schedule and other public information. The reliability of the message is secured by adopting an authorization strategy and ECDSA. In order to assess and evaluate the risks and potential weaknesses of the system, an easy-to-use prototype implementation was written and tested. Other possible uses and further research were also considered.
Resumo:
The inferior colliculus is a primary relay for the processing of auditory information in the brainstem. The inferior colliculus is also part of the so-called brain aversion system as animals learn to switch off the electrical stimulation of this structure. The purpose of the present study was to determine whether associative learning occurs between aversion induced by electrical stimulation of the inferior colliculus and visual and auditory warning stimuli. Rats implanted with electrodes into the central nucleus of the inferior colliculus were placed inside an open-field and thresholds for the escape response to electrical stimulation of the inferior colliculus were determined. The rats were then placed inside a shuttle-box and submitted to a two-way avoidance paradigm. Electrical stimulation of the inferior colliculus at the escape threshold (98.12 ± 6.15 (A, peak-to-peak) was used as negative reinforcement and light or tone as the warning stimulus. Each session consisted of 50 trials and was divided into two segments of 25 trials in order to determine the learning rate of the animals during the sessions. The rats learned to avoid the inferior colliculus stimulation when light was used as the warning stimulus (13.25 ± 0.60 s and 8.63 ± 0.93 s for latencies and 12.5 ± 2.04 and 19.62 ± 1.65 for frequencies in the first and second halves of the sessions, respectively, P<0.01 in both cases). No significant changes in latencies (14.75 ± 1.63 and 12.75 ± 1.44 s) or frequencies of responses (8.75 ± 1.20 and 11.25 ± 1.13) were seen when tone was used as the warning stimulus (P>0.05 in both cases). Taken together, the present results suggest that rats learn to avoid the inferior colliculus stimulation when light is used as the warning stimulus. However, this learning process does not occur when the neutral stimulus used is an acoustic one. Electrical stimulation of the inferior colliculus may disturb the signal transmission of the stimulus to be conditioned from the inferior colliculus to higher brain structures such as amygdala
Resumo:
Companies require information in order to gain an improved understanding of their customers. Data concerning customers, their interests and behavior are collected through different loyalty programs. The amount of data stored in company data bases has increased exponentially over the years and become difficult to handle. This research area is the subject of much current interest, not only in academia but also in practice, as is shown by several magazines and blogs that are covering topics on how to get to know your customers, Big Data, information visualization, and data warehousing. In this Ph.D. thesis, the Self-Organizing Map and two extensions of it – the Weighted Self-Organizing Map (WSOM) and the Self-Organizing Time Map (SOTM) – are used as data mining methods for extracting information from large amounts of customer data. The thesis focuses on how data mining methods can be used to model and analyze customer data in order to gain an overview of the customer base, as well as, for analyzing niche-markets. The thesis uses real world customer data to create models for customer profiling. Evaluation of the built models is performed by CRM experts from the retailing industry. The experts considered the information gained with help of the models to be valuable and useful for decision making and for making strategic planning for the future.
Resumo:
Advancements in information technology have made it possible for organizations to gather and store vast amounts of data of their customers. Information stored in databases can be highly valuable for organizations. However, analyzing large databases has proven to be difficult in practice. For companies in the retail industry, customer intelligence can be used to identify profitable customers, their characteristics, and behavior. By clustering customers into homogeneous groups, companies can more effectively manage their customer base and target profitable customer segments. This thesis will study the use of the self-organizing map (SOM) as a method for analyzing large customer datasets, clustering customers, and discovering information about customer behavior. Aim of the thesis is to find out whether the SOM could be a practical tool for retail companies to analyze their customer data.
Resumo:
Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.
Resumo:
This thesis explores the debate and issues regarding the status of visual ;,iferellces in the optical writings of Rene Descartes, George Berkeley and James 1. Gibson. It gathers arguments from across their works and synthesizes an account of visual depthperception that accurately reflects the larger, metaphysical implications of their philosophical theories. Chapters 1 and 2 address the Cartesian and Berkelean theories of depth-perception, respectively. For Descartes and Berkeley the debate can be put in the following way: How is it possible that we experience objects as appearing outside of us, at various distances, if objects appear inside of us, in the representations of the individual's mind? Thus, the Descartes-Berkeley component of the debate takes place exclusively within a representationalist setting. Representational theories of depthperception are rooted in the scientific discovery that objects project a merely twodimensional patchwork of forms on the retina. I call this the "flat image" problem. This poses the problem of depth in terms of a difference between two- and three-dimensional orders (i.e., a gap to be bridged by one inferential procedure or another). Chapter 3 addresses Gibson's ecological response to the debate. Gibson argues that the perceiver cannot be flattened out into a passive, two-dimensional sensory surface. Perception is possible precisely because the body and the environment already have depth. Accordingly, the problem cannot be reduced to a gap between two- and threedimensional givens, a gap crossed with a projective geometry. The crucial difference is not one of a dimensional degree. Chapter 3 explores this theme and attempts to excavate the empirical and philosophical suppositions that lead Descartes and Berkeley to their respective theories of indirect perception. Gibson argues that the notion of visual inference, which is necessary to substantiate representational theories of indirect perception, is highly problematic. To elucidate this point, the thesis steps into the representationalist tradition, in order to show that problems that arise within it demand a tum toward Gibson's information-based doctrine of ecological specificity (which is to say, the theory of direct perception). Chapter 3 concludes with a careful examination of Gibsonian affordallces as the sole objects of direct perceptual experience. The final section provides an account of affordances that locates the moving, perceiving body at the heart of the experience of depth; an experience which emerges in the dynamical structures that cross the body and the world.
Resumo:
Please consult the paper edition of this thesis to read. It is available on the 5th Floor of the Library at Call Number: Z 9999.5 E38 L64 2008
Resumo:
Memory is a multi-component cognitive ability to retain and retrieve information presented in different modalities. Research on memory development has shown that the memory capacity and the processes improve gradually from early childhood to adolescence. Findings related to the sex-differences in memory abilities in early childhood have been inconsistent. Although previous research has demonstrated the effects of the modality of stimulus presentation (auditory versus verbal) and the type of material to be remembered (visual/spatial versus auditory/verbal) on the memory processes and memory organization, the recent research with children is rather limited. The present study is a secondary analysis of data, originally collected from 530 typically developing Turkish children and adolescents. The purpose of the present study was to examine the age-related developments and sex differences in auditory-verbal and visual-spatial short-term memory (STM) in 177 typically developing male and female children, 5 to 8 years of age. Dot-Locations and Word-Lists from the Children's Memory Scale were used to measure visual-spatial and auditory-verbal STM performances, respectively. The findings of the present study suggest age-related differences in both visual-spatial and auditory-verbal STM. Sex-differences were observed only in one visual-spatial STM subtest performance. Modality comparisons revealed age- and task-related differences between auditory-verbal and visual-spatial STM performances. There were no sex-related effects in terms of modality specific performances. Overall, the results of this study provide evidence of STM development in early childhood, and these effects were mostly independent of sex and the modality of the task.
Resumo:
In the literature, persistent neural activity over frontal and parietal areas during the delay period of oculomotor delayed response (ODR) tasks has been interpreted as an active representation of task relevant information and response preparation. Following a recent ERP study (Tekok-Kilic, Tays, & Tkach, 2011 ) that reported task related slow wave differences over frontal and parietal sites during the delay periods of three ODR tasks, the present investigation explored developmental differences in young adults and adolescents during the same ODR tasks using 128-channel dense electrode array methodology and source localization. This exploratory study showed that neural functioning underlying visual-spatial WM differed between age groups in the Match condition. More specifically, this difference is localized anteriorly during the late delay period. Given the protracted maturation of the frontal lobes, the observed variation at the frontal site may indicate that adolescents and young adults may recruit frontal-parietal resources differently.
Resumo:
Behavioral researchers commonly use single subject designs to evaluate the effects of a given treatment. Several different methods of data analysis are used, each with their own set of methodological strengths and limitations. Visual inspection is commonly used as a method of analyzing data which assesses the variability, level, and trend both within and between conditions (Cooper, Heron, & Heward, 2007). In an attempt to quantify treatment outcomes, researchers developed two methods for analysing data called Percentage of Non-overlapping Data Points (PND) and Percentage of Data Points Exceeding the Median (PEM). The purpose of the present study is to compare and contrast the use of Hierarchical Linear Modelling (HLM), PND and PEM in single subject research. The present study used 39 behaviours, across 17 participants to compare treatment outcomes of a group cognitive behavioural therapy program, using PND, PEM, and HLM on three response classes of Obsessive Compulsive Behaviour in children with Autism Spectrum Disorder. Findings suggest that PEM and HLM complement each other and both add invaluable information to the overall treatment results. Future research should consider using both PEM and HLM when analysing single subject designs, specifically grouped data with variability.
Resumo:
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.
Resumo:
Le principal rôle du corps calleux est d’assurer le transfert de l’information entre les hémisphères cérébraux. Du support empirique pour cette fonction provient d’études investiguant la communication interhémisphérique chez les individus à cerveau divisé (ICD). Des paradigmes expérimentaux exigeant une intégration interhémisphérique de l’information permettent de documenter certains signes de déconnexion calleuse chez ces individus. La présente thèse a investigué le transfert de l’information sous-tendant les phénomènes de gain de redondance (GR), de différence croisé– non-croisé (DCNC) et d’asynchronie bimanuelle chez les ICD et les individus normaux, et a ainsi contribué à préciser le rôle du corps calleux. Une première étude a comparé le GR des individus normaux et des ICD ayant subi une section partielle ou totale du corps calleux. Dans une tâche de détection, le GR consiste en la réduction des temps de réaction (TR) lorsque deux stimuli sont présentés plutôt qu’un seul. Typiquement, les ICD présentent un GR beaucoup plus grand (supra-GR) que celui des individus normaux (Reuter-Lorenz, Nozawa, Gazzaniga, & Hughes, 1995). Afin d’investiguer les conditions d’occurrence du supra-GR, nous avons évalué le GR en présentation interhémisphérique, intrahémisphérique et sur le méridien vertical, ainsi qu’avec des stimuli requérant une contribution corticale différente (luminance, couleur équiluminante ou mouvement). La présence d’un supra-GR chez les ICD partiels et totaux en comparaison avec celui des individus normaux a été confirmée. Ceci suggère qu’une section antérieure du corps calleux, qui perturbe le transfert d’informations de nature motrice/décisionnelle, est suffisante pour produire un supra-GR chez les ICD. Nos données permettent aussi d’affirmer que, contrairement au GR des individus normaux, celui des ICD totaux est sensible aux manipulations sensorielles. Nous concluons donc que le supra-GR des ICD est à la fois attribuable à des contributions sensorielles et motrices/décisionnelles. Une deuxième étude a investigué la DCNC et l’asynchronie bimanuelle chez les ICD et les individus normaux. La DCNC réfère à la soustraction des TR empruntant une voie anatomique « non-croisée » aux TR empruntant une voie anatomique « croisée », fournissant ainsi une estimation du temps de transfert interhémisphérique. Dans le contexte de notre étude, l’asynchronie bimanuelle réfère à la différence de TR entre la main gauche et la main droite, sans égard à l’hémichamp de présentation. Les effets de manipulations sensorielles et attentionnelles ont été évalués pour les deux mesures. Cette étude a permis d’établir une dissociation entre la DCNC et l’asynchronie bimanuelle. Précisément, les ICD totaux, mais non les ICD partiels, ont montré une DCNC significativement plus grande que celle des individus normaux, alors que les deux groupes d’ICD se sont montrés plus asynchrones que les individus normaux. Nous postulons donc que des processus indépendants sous-tendent la DCNC et la synchronie bimanuelle. De plus, en raison de la modulation parallèle du GR et de l’asynchronie bimanuelle entre les groupes, nous suggérons qu’un processus conjoint sous-tend ces deux mesures.