937 resultados para multi-modal speaker identification
Resumo:
Conventional feed forward Neural Networks have used the sum-of-squares cost function for training. A new cost function is presented here with a description length interpretation based on Rissanen's Minimum Description Length principle. It is a heuristic that has a rough interpretation as the number of data points fit by the model. Not concerned with finding optimal descriptions, the cost function prefers to form minimum descriptions in a naive way for computational convenience. The cost function is called the Naive Description Length cost function. Finding minimum description models will be shown to be closely related to the identification of clusters in the data. As a consequence the minimum of this cost function approximates the most probable mode of the data rather than the sum-of-squares cost function that approximates the mean. The new cost function is shown to provide information about the structure of the data. This is done by inspecting the dependence of the error to the amount of regularisation. This structure provides a method of selecting regularisation parameters as an alternative or supplement to Bayesian methods. The new cost function is tested on a number of multi-valued problems such as a simple inverse kinematics problem. It is also tested on a number of classification and regression problems. The mode-seeking property of this cost function is shown to improve prediction in time series problems. Description length principles are used in a similar fashion to derive a regulariser to control network complexity.
Resumo:
Advancements in retinal imaging technologies have drastically improved the quality of eye care in the past couple decades. Scanning laser ophthalmoscopy (SLO) and optical coherence tomography (OCT) are two examples of critical imaging modalities for the diagnosis of retinal pathologies. However current-generation SLO and OCT systems have limitations in diagnostic capability due to the following factors: the use of bulky tabletop systems, monochromatic imaging, and resolution degradation due to ocular aberrations and diffraction.
Bulky tabletop SLO and OCT systems are incapable of imaging patients that are supine, under anesthesia, or otherwise unable to maintain the required posture and fixation. Monochromatic SLO and OCT imaging prevents the identification of various color-specific diagnostic markers visible with color fundus photography like those of neovascular age-related macular degeneration. Resolution degradation due to ocular aberrations and diffraction has prevented the imaging of photoreceptors close to the fovea without the use of adaptive optics (AO), which require bulky and expensive components that limit the potential for widespread clinical use.
In this dissertation, techniques for extending the diagnostic capability of SLO and OCT systems are developed. These techniques include design strategies for miniaturizing and combining SLO and OCT to permit multi-modal, lightweight handheld probes to extend high quality retinal imaging to pediatric eye care. In addition, a method for extending true color retinal imaging to SLO to enable high-contrast, depth-resolved, high-fidelity color fundus imaging is demonstrated using a supercontinuum light source. Finally, the development and combination of SLO with a super-resolution confocal microscopy technique known as optical photon reassignment (OPRA) is demonstrated to enable high-resolution imaging of retinal photoreceptors without the use of adaptive optics.
Resumo:
In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine.
Resumo:
How does the multi-sensory nature of stimuli influence information processing? Cognitive systems with limited selective attention can elucidate these processes. Six-year-olds, 11-year-olds and 20-year-olds engaged in a visual search task that required them to detect a pre-defined coloured shape under conditions of low or high visual perceptual load. On each trial, a peripheral distractor that could be either compatible or incompatible with the current target colour was presented either visually, auditorily or audiovisually. Unlike unimodal distractors, audiovisual distractors elicited reliable compatibility effects across the two levels of load in adults and in the older children, but high visual load significantly reduced distraction for all children, especially the youngest participants. This study provides the first demonstration that multi-sensory distraction has powerful effects on selective attention: Adults and older children alike allocate attention to potentially relevant information across multiple senses. However, poorer attentional resources can, paradoxically, shield the youngest children from the deleterious effects of multi-sensory distraction. Furthermore, we highlight how developmental research can enrich the understanding of distinct mechanisms controlling adult selective attention in multi-sensory environments.
Resumo:
The 2009-2010 Data Fusion Contest organized by the Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society was focused on the detection of flooded areas using multi-temporal and multi-modal images. Both high spatial resolution optical and synthetic aperture radar data were provided. The goal was not only to identify the best algorithms (in terms of accuracy), but also to investigate the further improvement derived from decision fusion. This paper presents the four awarded algorithms and the conclusions of the contest, investigating both supervised and unsupervised methods and the use of multi-modal data for flood detection. Interestingly, a simple unsupervised change detection method provided similar accuracy as supervised approaches, and a digital elevation model-based predictive method yielded a comparable projected change detection map without using post-event data.
Resumo:
Erythropoietin (EPO) has been recognized as a neuroprotective agent. In animal models of neonatal brain injury, exogenous EPO has been shown to reduce lesion size, improve structure and function. Experimental studies have focused on short course treatment after injury. Timing, dose and length of treatment in preterm brain damage remain to be defined. We have evaluated the effects of high dose and long-term EPO treatment in hypoxic-ischemic (HI) injury in 3 days old (P3) rat pups using histopathology, magnetic resonance imaging (MRI) and spectroscopy (MRS) as well as functional assessment with somatosensory-evoked potentials (SEP). After HI, rat pups were assessed by MRI for initial damage and were randomized to receive EPO or vehicle. At the end of treatment period (P25) the size of resulting cortical damage and white matter (WM) microstructure integrity were assessed by MRI and cortical metabolism by MRS. Whisker elicited SEP were recorded to evaluate somatosensory function. Brains were collected for neuropathological assessment. The EPO treated animals did not show significant decrease of the HI induced cortical loss at P25. WM microstructure measured by diffusion tensor imaging was improved and SEP response in the injured cortex was recovered in the EPO treated animals compared to vehicle treated animals. In addition, the metabolic profile was less altered in the EPO group. Long-term treatment with high dose EPO after HI injury in the very immature rat brain induced recovery of WM microstructure and connectivity as well as somatosensory cortical function despite no effects on volume of cortical damage. This indicates that long-term high-dose EPO induces recovery of structural and functional connectivity despite persisting gross anatomical cortical alteration resulting from HI.
Resumo:
Introduction: The field of Connectomic research is growing rapidly, resulting from methodological advances in structural neuroimaging on many spatial scales. Especially progress in Diffusion MRI data acquisition and processing made available macroscopic structural connectivity maps in vivo through Connectome Mapping Pipelines (Hagmann et al, 2008) into so-called Connectomes (Hagmann 2005, Sporns et al, 2005). They exhibit both spatial and topological information that constrain functional imaging studies and are relevant in their interpretation. The need for a special-purpose software tool for both clinical researchers and neuroscientists to support investigations of such connectome data has grown. Methods: We developed the ConnectomeViewer, a powerful, extensible software tool for visualization and analysis in connectomic research. It uses the novel defined container-like Connectome File Format, specifying networks (GraphML), surfaces (Gifti), volumes (Nifti), track data (TrackVis) and metadata. Usage of Python as programming language allows it to by cross-platform and have access to a multitude of scientific libraries. Results: Using a flexible plugin architecture, it is possible to enhance functionality for specific purposes easily. Following features are already implemented: * Ready usage of libraries, e.g. for complex network analysis (NetworkX) and data plotting (Matplotlib). More brain connectivity measures will be implemented in a future release (Rubinov et al, 2009). * 3D View of networks with node positioning based on corresponding ROI surface patch. Other layouts possible. * Picking functionality to select nodes, select edges, get more node information (ConnectomeWiki), toggle surface representations * Interactive thresholding and modality selection of edge properties using filters * Arbitrary metadata can be stored for networks, thereby allowing e.g. group-based analysis or meta-analysis. * Python Shell for scripting. Application data is exposed and can be modified or used for further post-processing. * Visualization pipelines using filters and modules can be composed with Mayavi (Ramachandran et al, 2008). * Interface to TrackVis to visualize track data. Selected nodes are converted to ROIs for fiber filtering The Connectome Mapping Pipeline (Hagmann et al, 2008) processed 20 healthy subjects into an average Connectome dataset. The Figures show the ConnectomeViewer user interface using this dataset. Connections are shown that occur in all 20 subjects. The dataset is freely available from the homepage (connectomeviewer.org). Conclusions: The ConnectomeViewer is a cross-platform, open-source software tool that provides extensive visualization and analysis capabilities for connectomic research. It has a modular architecture, integrates relevant datatypes and is completely scriptable. Visit www.connectomics.org to get involved as user or developer.
Resumo:
High-frequency oscillations in the gamma-band reflect rhythmic synchronization of spike timing in active neural networks. The modulation of gamma oscillations is a widely established mechanism in a variety of neurobiological processes, yet its neurochemical basis is not fully understood. Modeling, in-vitro and in-vivo animal studies suggest that gamma oscillation properties depend on GABAergic inhibition. In humans, search for evidence linking total GABA concentration to gamma oscillations has led to promising -but also to partly diverging- observations. Here, we provide the first evidence of a direct relationship between the density of GABAA receptors and gamma oscillatory gamma responses in human primary visual cortex (V1). By combining Flumazenil-PET (to measure resting-levels of GABAA receptor density) and MEG (to measure visually-induced gamma oscillations), we found that GABAA receptor densities correlated positively with the frequency and negatively with amplitude of visually-induced gamma oscillations in V1. Our findings demonstrate that gamma-band response profiles of primary visual cortex across healthy individuals are shaped by GABAA-receptor-mediated inhibitory neurotransmission. These results bridge the gap with in-vitro and animal studies and may have future clinical implications given that altered GABAergic function, including dysregulation of GABAA receptors, has been related to psychiatric disorders including schizophrenia and depression.
Characterizing Dynamic Optimization Benchmarks for the Comparison of Multi-Modal Tracking Algorithms
Resumo:
Population-based metaheuristics, such as particle swarm optimization (PSO), have been employed to solve many real-world optimization problems. Although it is of- ten sufficient to find a single solution to these problems, there does exist those cases where identifying multiple, diverse solutions can be beneficial or even required. Some of these problems are further complicated by a change in their objective function over time. This type of optimization is referred to as dynamic, multi-modal optimization. Algorithms which exploit multiple optima in a search space are identified as niching algorithms. Although numerous dynamic, niching algorithms have been developed, their performance is often measured solely on their ability to find a single, global optimum. Furthermore, the comparisons often use synthetic benchmarks whose landscape characteristics are generally limited and unknown. This thesis provides a landscape analysis of the dynamic benchmark functions commonly developed for multi-modal optimization. The benchmark analysis results reveal that the mechanisms responsible for dynamism in the current dynamic bench- marks do not significantly affect landscape features, thus suggesting a lack of representation for problems whose landscape features vary over time. This analysis is used in a comparison of current niching algorithms to identify the effects that specific landscape features have on niching performance. Two performance metrics are proposed to measure both the scalability and accuracy of the niching algorithms. The algorithm comparison results demonstrate the algorithms best suited for a variety of dynamic environments. This comparison also examines each of the algorithms in terms of their niching behaviours and analyzing the range and trade-off between scalability and accuracy when tuning the algorithms respective parameters. These results contribute to the understanding of current niching techniques as well as the problem features that ultimately dictate their success.
Resumo:
Motivation for Speaker recognition work is presented in the first part of the thesis. An exhaustive survey of past work in this field is also presented. A low cost system not including complex computation has been chosen for implementation. Towards achieving this a PC based system is designed and developed. A front end analog to digital convertor (12 bit) is built and interfaced to a PC. Software to control the ADC and to perform various analytical functions including feature vector evaluation is developed. It is shown that a fixed set of phrases incorporating evenly balanced phonemes is aptly suited for the speaker recognition work at hand. A set of phrases are chosen for recognition. Two new methods are adopted for the feature evaluation. Some new measurements involving a symmetry check method for pitch period detection and ACE‘ are used as featured. Arguments are provided to show the need for a new model for speech production. Starting from heuristic, a knowledge based (KB) speech production model is presented. In this model, a KB provides impulses to a voice producing mechanism and constant correction is applied via a feedback path. It is this correction that differs from speaker to speaker. Methods of defining measurable parameters for use as features are described. Algorithms for speaker recognition are developed and implemented. Two methods are presented. The first is based on the model postulated. Here the entropy on the utterance of a phoneme is evaluated. The transitions of voiced regions are used as speaker dependent features. The second method presented uses features found in other works, but evaluated differently. A knock—out scheme is used to provide the weightage values for the selection of features. Results of implementation are presented which show on an average of 80% recognition. It is also shown that if there are long gaps between sessions, the performance deteriorates and is speaker dependent. Cross recognition percentages are also presented and this in the worst case rises to 30% while the best case is 0%. Suggestions for further work are given in the concluding chapter.
Resumo:
Since the advent of the internet in every day life in the 1990s, the barriers to producing, distributing and consuming multimedia data such as videos, music, ebooks, etc. have steadily been lowered for most computer users so that almost everyone with internet access can join the online communities who both produce, consume and of course also share media artefacts. Along with this trend, the violation of personal data privacy and copyright has increased with illegal file sharing being rampant across many online communities particularly for certain music genres and amongst the younger age groups. This has had a devastating effect on the traditional media distribution market; in most cases leaving the distribution companies and the content owner with huge financial losses. To prove that a copyright violation has occurred one can deploy fingerprinting mechanisms to uniquely identify the property. However this is currently based on only uni-modal approaches. In this paper we describe some of the design challenges and architectural approaches to multi-modal fingerprinting currently being examined for evaluation studies within a PhD research programme on optimisation of multi-modal fingerprinting architectures. Accordingly we outline the available modalities that are being integrated through this research programme which aims to establish the optimal architecture for multi-modal media security protection over the internet as the online distribution environment for both legal and illegal distribution of media products.
Resumo:
Context-aware multimodal interactive systems aim to adapt to the needs and behavioural patterns of users and offer a way forward for enhancing the efficacy and quality of experience (QoE) in human-computer interaction. The various modalities that constribute to such systems each provide a specific uni-modal response that is integratively presented as a multi-modal interface capable of interpretation of multi-modal user input and appropriately responding to it through dynamically adapted multi-modal interactive flow management , This paper presents an initial background study in the context of the first phase of a PhD research programme in the area of optimisation of data fusion techniques to serve multimodal interactivite systems, their applications and requirements.
Resumo:
Fingerprinting is a well known approach for identifying multimedia data without having the original data present but what amounts to its essence or ”DNA”. Current approaches show insufficient deployment of three types of knowledge that could be brought to bear in providing a finger printing framework that remains effective, efficient and can accommodate both the whole as well as elemental protection at appropriate levels of abstraction to suit various Foci of Interest (FoI) in an image or cross media artefact. Thus our proposed framework aims to deliver selective composite fingerprinting that remains responsive to the requirements for protection of whole or parts of an image which may be of particularly interest and be especially vulnerable to attempts at rights violation. This is powerfully aided by leveraging both multi-modal information as well as a rich spectrum of collateral context knowledge including both image-level collaterals as well as the inevitably needed market intelligence knowledge such as customers’ social networks interests profiling which we can deploy as a crucial component of our Fingerprinting Collateral Knowledge. This is used in selecting the special FoIs within an image or other media content that have to be selectively and collaterally protected.
Resumo:
Fingerprinting is a well known approach for identifying multimedia data without having the original data present but instead what amounts to its essence or 'DNA'. Current approaches show insufficient deployment of various types of knowledge that could be brought to bear in providing a fingerprinting framework that remains effective, efficient and can accommodate both the whole as well as elemental protection at appropriate levels of abstraction to suit various Zones of Interest (ZoI) in an image or cross media artefact. The proposed framework aims to deliver selective composite fingerprinting that is powerfully aided by leveraging both multi-modal information as well as a rich spectrum of collateral context knowledge including both image-level collaterals and also the inevitably needed market intelligence knowledge such as customers' social networks interests profiling which we can deploy as a crucial component of our fingerprinting collateral knowledge.