322 resultados para Speech Processing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mining environment presents a challenging prospect for stereo vision. Our objective is to produce a stereo vision sensor suited to close-range scenes consisting mostly of rocks. This sensor should produce a dense depth map within real-time constraints. Speed and robustness are of foremost importance for this application. This paper compares a number of stereo matching algorithms in terms of robustness and suitability to fast implementation. These include traditional area-based algorithms, and algorithms based on non-parametric transforms, notably the rank and census transforms. Our experimental results show that the rank and census transforms are robust with respect to radiometric distortion and introduce less computational complexity than conventional area-based matching techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Attentional Control Theory (ACT) proposes that high-anxious individuals maintain performance effectiveness (accuracy) at the expense of processing efficiency (response time), in particular, the two central executive functions of inhibition and shifting. In contrast, research has generally failed to consider the third executive function which relates to the function of updating. In the current study, seventy-five participants completed the Parametric Go/No-Go and n-back tasks, as well as the State-Trait Anxiety Inventory in order to explore the effects of anxiety on attention. Results indicated that anxiety lead to decay in processing efficiency, but not in performance effectiveness, across all three Central Executive functions (inhibition, set-shifting and updating). Interestingly, participants with high levels of trait anxiety also exhibited impaired performance effectiveness on the n-back task designed to measure the updating function. Findings are discussed in relation to developing a new model of ACT that also includes the role of preattentive processes and dual-task coordination when exploring the effects of anxiety on task performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the Bayesian Information Criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 Rich Transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Superconducting thick films of Bi2Sr2CaCu2Oy (Bi-2212) on single-crystalline (100) MgO substrates have been prepared using a doctor-blade technique and a partial-melt process. It is found that the phase composition and the amount of Ag addition to the paste affect the structure and superconducting properties of the partially melted thick films. The optimum heat treatment schedule for obtaining high Jc has been determined for each paste. The heat treatment ensures attainment of high purity for the crystalline Bi-2212 phase and high orientation of Bi-2212 crystals, in which the c-axis is perpendicular to the substrate. The highest Tc, obtained by resistivity measurement, is 92.2 K. The best value for Jct (transport) of these thick films, measured at 77 K in self-field, is 8 × 10 3 Acm -2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

utomatic pain monitoring has the potential to greatly improve patient diagnosis and outcomes by providing a continuous objective measure. One of the most promising methods is to do this via automatically detecting facial expressions. However, current approaches have failed due to their inability to: 1) integrate the rigid and non-rigid head motion into a single feature representation, and 2) incorporate the salient temporal patterns into the classification stage. In this paper, we tackle the first problem by developing a “histogram of facial action units” representation using Active Appearance Model (AAM) face features, and then utilize a Hidden Conditional Random Field (HCRF) to overcome the second issue. We show that both of these methods improve the performance on the task of pain detection in sequence level compared to current state-of-the-art-methods on the UNBC-McMaster Shoulder Pain Archive.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual abnormalities, both at the sensory input and the higher interpretive levels, have been associated with many of the symptoms of schizophrenia. Individuals with schizophrenia typically experience distortions of sensory perception, resulting in perceptual hallucinations and delusions that are related to the observed visual deficits. Disorganised speech, thinking and behaviour are commonly experienced by sufferers of the disorder, and have also been attributed to perceptual disturbances associated with anomalies in visual processing. Compounding these issues are marked deficits in cognitive functioning that are observed in approximately 80% of those with schizophrenia. Cognitive impairments associated with schizophrenia include: difficulty with concentration and memory (i.e. working, visual and verbal), an impaired ability to process complex information, response inhibition and deficits in speed of processing, visual and verbal learning. Deficits in sustained attention or vigilance, poor executive functioning such as poor reasoning, problem solving, and social cognition, are all influenced by impaired visual processing. These symptoms impact on the internal perceptual world of those with schizophrenia, and hamper their ability to navigate their external environment. Visual processing abnormalities in schizophrenia are likely to worsen personal, social and occupational functioning. Binocular rivalry provides a unique opportunity to investigate the processes involved in visual awareness and visual perception. Binocular rivalry is the alternation of perceptual images that occurs when conflicting visual stimuli are presented to each eye in the same retinal location. The observer perceives the opposing images in an alternating fashion, despite the sensory input to each eye remaining constant. Binocular rivalry tasks have been developed to investigate specific parts of the visual system. The research presented in this Thesis provides an explorative investigation into binocular rivalry in schizophrenia, using the method of Pettigrew and Miller (1998) and comparing individuals with schizophrenia to healthy controls. This method allows manipulations to the spatial and temporal frequency, luminance contrast and chromaticity of the visual stimuli. Manipulations to the rival stimuli affect the rate of binocular rivalry alternations and the time spent perceiving each image (dominance duration). Binocular rivalry rate and dominance durations provide useful measures to investigate aspects of visual neural processing that lead to the perceptual disturbances and cognitive dysfunction attributed to schizophrenia. However, despite this promise the binocular rivalry phenomenon has not been extensively explored in schizophrenia to date. Following a review of the literature, the research in this Thesis examined individual variation in binocular rivalry. The initial study (Chapter 2) explored the effect of systematically altering the properties of the stimuli (i.e. spatial and temporal frequency, luminance contrast and chromaticity) on binocular rivalry rate and dominance durations in healthy individuals (n=20). The findings showed that altering the stimuli with respect to temporal frequency and luminance contrast significantly affected rate. This is significant as processing of temporal frequency and luminance contrast have consistently been demonstrated to be abnormal in schizophrenia. The current research then explored binocular rivalry in schizophrenia. The primary research question was, "Are binocular rivalry rates and dominance durations recorded in participants with schizophrenia different to those of the controls?" In this second study binocular rivalry data that were collected using low- and highstrength binocular rivalry were compared to alternations recorded during a monocular rivalry task, the Necker Cube task to replicate and advance the work of Miller et al., (2003). Participants with schizophrenia (n=20) recorded fewer alternations (i.e. slower alternation rates) than control participants (n=20) on both binocular rivalry tasks, however no difference was observed between the groups on the Necker cube task. Magnocellular and parvocellular visual pathways, thought to be abnormal in schizophrenia, were also investigated in binocular rivalry. The binocular rivalry stimuli used in this third study (Chapter 4) were altered to bias the task for one of these two pathways. Participants with schizophrenia recorded slower binocular rivalry rates than controls in both binocular rivalry tasks. Using a ‘within subject design’, binocular rivalry data were compared to data collected from a backwardmasking task widely accepted to bias both these pathways. Based on these data, a model of binocular rivalry, based on the magnocellular and parvocellular pathways that contribute to the dorsal and ventral visual streams, was developed. Binocular rivalry rates were compared with performance on the Benton’s Judgment of Line Orientation task, in individuals with schizophrenia compared to healthy controls (Chapter 5). The Benton’s Judgment of Line Orientation task is widely accepted to be processed within the right cerebral hemisphere, making it an appropriate task to investigate the role of the cerebral hemispheres in binocular rivalry, and to investigate the inter-hemispheric switching hypothesis of binocular rivalry proposed by Pettigrew and Miller (1998, 2003). The data were suggestive of intra-hemispheric rather than an inter-hemispheric visual processing in binocular rivalry. Neurotransmitter involvement in binocular rivalry, backward masking and Judgment of Line Orientation in schizophrenia were investigated using a genetic indicator of dopamine receptor distribution and functioning; the presence of the Taq1 allele of the dopamine D2 receptor (DRD2) receptor gene. This final study (Chapter 6) explored whether the presence of the Taq1 allele of the DRD2 receptor gene, and thus, by inference the distribution of dopamine receptors and dopamine function, accounted for the large individual variation in binocular rivalry. The presence of the Taq1 allele was associated with slower binocular rivalry rates or poorer performance in the backward masking and Judgment of Line Orientation tasks seen in the group with schizophrenia. This Thesis has contributed to what is known about binocular rivalry in schizophrenia. Consistently slower binocular rivalry rates were observed in participants with schizophrenia, indicating abnormally-slow visual processing in this group. These data support previous studies reporting visual processing abnormalities in schizophrenia and suggest that a slow binocular rivalry rate is not a feature specific to bipolar disorder, but may be a feature of disorders with psychotic features generally. The contributions of the magnocellular or dorsal pathways and parvocellular or ventral pathways to binocular rivalry, and therefore to perceptual awareness, were investigated. The data presented supported the view that the magnocellular system initiates perceptual awareness of an image and the parvocellular system maintains the perception of the image, making it available to higher level processing occurring within the cortical hemispheres. Abnormal magnocellular and parvocellular processing may both contribute to perceptual disturbances that ultimately contribute to the cognitive dysfunction associated with schizophrenia. An alternative model of binocular rivalry based on these observations was proposed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vision-based SLAM is mostly a solved problem providing clear, sharp images can be obtained. However, in outdoor environments a number of factors such as rough terrain, high speeds and hardware limitations can result in these conditions not being met. High speed transit on rough terrain can lead to image blur and under/over exposure, problems that cannot easily be dealt with using low cost hardware. Furthermore, recently there has been a growth in interest in lifelong autonomy for robots, which brings with it the challenge in outdoor environments of dealing with a moving sun and lack of constant artificial lighting. In this paper, we present a lightweight approach to visual localization and visual odometry that addresses the challenges posed by perceptual change and low cost cameras. The approach combines low resolution imagery with the SLAM algorithm, RatSLAM. We test the system using a cheap consumer camera mounted on a small vehicle in a mixed urban and vegetated environment, at times ranging from dawn to dusk and in conditions ranging from sunny weather to rain. We first show that the system is able to provide reliable mapping and recall over the course of the day and incrementally incorporate new visual scenes from different times into an existing map. We then restrict the system to only learning visual scenes at one time of day, and show that the system is still able to localize and map at other times of day. The results demonstrate the viability of the approach in situations where image quality is poor and environmental or hardware factors preclude the use of visual features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract: Texture enhancement is an important component of image processing, with extensive application in science and engineering. The quality of medical images, quantified using the texture of the images, plays a significant role in the routine diagnosis performed by medical practitioners. Previously, image texture enhancement was performed using classical integral order differential mask operators. Recently, first order fractional differential operators were implemented to enhance images. Experiments conclude that the use of the fractional differential not only maintains the low frequency contour features in the smooth areas of the image, but also nonlinearly enhances edges and textures corresponding to high-frequency image components. However, whilst these methods perform well in particular cases, they are not routinely useful across all applications. To this end, we applied the second order Riesz fractional differential operator to improve upon existing approaches of texture enhancement. Compared with the classical integral order differential mask operators and other fractional differential operators, our new algorithms provide higher signal to noise values, which leads to superior image quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Balcony acoustic treatments can mitigate the effects of community road traffic noise. To further investigate, a theoretical study into the effects of balcony acoustic treatment combinations on speech interference and transmission is conducted for various street geometries. Nine different balcony types are investigated using a combined specular and diffuse reflection computer model. Diffusion in the model is calculated using the radiosity technique. The balcony types include a standard balcony with or without a ceiling and with various combinations of parapet, ceiling absorption and ceiling shield. A total of 70 balcony and street geometrical configurations are analyzed with each balcony type, resulting in 630 scenarios. In each scenario the reverberation time, speech interference level (SIL) and speech transmission index (STI) are calculated. These indicators are compared to determine trends based on the effects of propagation path, inclusion of opposite buildings and difference with a reference position outside the balcony. The results demonstrate trends in SIL and STI with different balcony types. It is found that an acoustically treated balcony reduces speech interference. A parapet provides the largest improvement, followed by absorption on the ceiling. The largest reductions in speech interference arise when a combination of balcony acoustic treatments are applied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Organizations make increasingly use of social media in order to compete for customer awareness and improve the quality of their goods and services. Multiple techniques of social media analysis are already in use. Nevertheless, theoretical underpinnings and a sound research agenda are still unavailable in this field at the present time. In order to contribute to setting up such an agenda, we introduce digital social signal processing (DSSP) as a new research stream in IS that requires multi-facetted investigations. Our DSSP concept is founded upon a set of four sequential activities: sensing digital social signals that are emitted by individuals on social media; decoding online data of social media in order to reconstruct digital social signals; matching the signals with consumers’ life events; and configuring individualized goods and service offerings tailored to the individual needs of customers. We further contribute to tying loose ends of different research areas together, in order to frame DSSP as a field for further investigation. We conclude with developing a research agenda.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classifier selection is a problem encountered by multi-biometric systems that aim to improve performance through fusion of decisions. A particular decision fusion architecture that combines multiple instances (n classifiers) and multiple samples (m attempts at each classifier) has been proposed in previous work to achieve controlled trade-off between false alarms and false rejects. Although analysis on text-dependent speaker verification has demonstrated better performance for fusion of decisions with favourable dependence compared to statistically independent decisions, the performance is not always optimal. Given a pool of instances, best performance with this architecture is obtained for certain combination of instances. Heuristic rules and diversity measures have been commonly used for classifier selection but it is shown that optimal performance is achieved for the `best combination performance' rule. As the search complexity for this rule increases exponentially with the addition of classifiers, a measure - the sequential error ratio (SER) - is proposed in this work that is specifically adapted to the characteristics of sequential fusion architecture. The proposed measure can be used to select a classifier that is most likely to produce a correct decision at each stage. Error rates for fusion of text-dependent HMM based speaker models using SER are compared with other classifier selection methodologies. SER is shown to achieve near optimal performance for sequential fusion of multiple instances with or without the use of multiple samples. The methodology applies to multiple speech utterances for telephone or internet based access control and to other systems such as multiple finger print and multiple handwriting sample based identity verification systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Aphasia is an acquired language disorder that can present a significant barrier to patient involvement in healthcare decisions. Speech-language pathologists (SLPs) are viewed as experts in the field of communication. However, many SLP students do not receive practical training in techniques to communicate with people with aphasia (PWA) until they encounter PWA during clinical education placements. Methods This study investigated the confidence and knowledge of SLP students in communicating with PWA prior to clinical placements using a customised questionnaire. Confidence in communicating with people with aphasia was assessed using a 100-point visual analogue scale. Linear, and logistic, regressions were used to examine the association between confidence and age, as well as confidence and course type (graduate-entry masters or undergraduate), respectively. Knowledge of strategies to assist communication with PWA was examined by asking respondents to list specific strategies that could assist communication with PWA. Results SLP students were not confident with the prospect of communicating with PWA; reporting a median 29-points (inter-quartile range 17–47) on the visual analogue confidence scale. Only, four (8.2%) of respondents rated their confidence greater than 55 (out of 100). Regression analyses indicated no relationship existed between confidence and students‘ age (p = 0.31, r-squared = 0.02), or confidence and course type (p = 0.22, pseudo r-squared = 0.03). Students displayed limited knowledge about communication strategies. Thematic analysis of strategies revealed four overarching themes; Physical, Verbal Communication, Visual Information and Environmental Changes. While most students identified potential use of resources (such as images and written information), fewer students identified strategies to alter their verbal communication (such as reduced speech rate). Conclusions SLP students who had received aphasia related theoretical coursework, but not commenced clinical placements with PWA, were not confident in their ability to communicate with PWA. Students may benefit from an educational intervention or curriculum modification to incorporate practical training in effective strategies to communicate with PWA, before they encounter PWA in clinical settings. Ensuring students have confidence and knowledge of potential communication strategies to assist communication with PWA may allow them to focus their learning experiences in more specific clinical domains, such as clinical reasoning, rather than building foundation interpersonal communication skills.