Biblioteca Digital

89 resultados para Speech and voice functions

The effect of prior visual information on recognition of speech and sounds

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).

Exploring the post-genomic world: Differing explanatory and manipulatory functions of post-genomic sciences

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Richard Lewontin proposed that the ability of a scientific field to create a narrative for public understanding garners it social relevance. This article applies Lewontin's conceptual framework of the functions of science (manipulatory and explanatory) to compare and explain the current differences in perceived societal relevance of genetics/genomics and proteomics. We provide three examples to illustrate the social relevance and strong cultural narrative of genetics/genomics for which no counterpart exists for proteomics. We argue that the major difference between genetics/genomics and proteomics is that genomics has a strong explanatory function, due to the strong cultural narrative of heredity. Based on qualitative interviews and observations of proteomics conferences, we suggest that the nature of proteins, lack of public understanding, and theoretical complexity exacerbates this difference for proteomics. Lewontin's framework suggests that social scientists may find that omics sciences affect social relations in different ways than past analyses of genetics.

Complete-linkage clustering for voice activity detection in audio and visual speech

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Fictionalising oral history : narrative analysis, voice and identity

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Details of a project which fictionalises the oral history of the life of the author's polio-afflicted grandmother Beth Bevan and her experiences at a home for children with disabilities are presented. The speech and language patterns recognised in the first person narration are described, as also the sense of voice and identity communicated through the oral history.

The delta-phase spectrum with application to voice activity detection and speaker recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For several reasons, the Fourier phase domain is less favored than the magnitude domain in signal processing and modeling of speech. To correctly analyze the phase, several factors must be considered and compensated, including the effect of the step size, windowing function and other processing parameters. Building on a review of these factors, this paper investigates a spectral representation based on the Instantaneous Frequency Deviation, but in which the step size between processing frames is used in calculating phase changes, rather than the traditional single sample interval. Reflecting these longer intervals, the term delta-phase spectrum is used to distinguish this from instantaneous derivatives. Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum (termed Mel-Frequency delta-phase features) can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Further, it is shown that the fusion of the magnitude and phase representations yields performance benefits over either in isolation.

Noise robust voice activity detection using normal probability testing and time-domain histogram analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a method of voice activity detection (VAD) suitable for high noise scenarios, based on the fusion of two complementary systems. The first system uses a proposed non-Gaussianity score (NGS) feature based on normal probability testing. The second system employs a histogram distance score (HDS) feature that detects changes in the signal through conducting a template-based similarity measure between adjacent frames. The decision outputs by the two systems are then merged using an open-by-reconstruction fusion stage. Accuracy of the proposed method was compared to several baseline VAD methods on a database created using real recordings of a variety of high-noise environments.

Heart versus mind : the functions of emotional and cognitive loyalty

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While there is substantial research on attitudinal and behavioral loyalty, the deconstruction of attitudinal loyalty into its two key components – emotional and cognitive loyalty – has been largely ignored. Despite the existence of managerial strategies aimed at increasing each of these two components, there is little academic research to support these managerial efforts. This paper seeks to advance the understanding of emotional and cognitive brand loyalty by examining the psychological function that these dimensions of brand loyalty perform for the consumer. We employ Katz’s (1960) four functions of attitudes (utilitarian, knowledge, value-expression, ego-defence) to investigate this question. Surveys using a convenience sample were completed by 268 consumers in two metropolitan cities on a variety of goods, services and durable products. The relationship between the functions and dimensions of loyalty were examined using MANOVA. The results show that both the utilitarian and knowledge functions of loyalty are significantly positively related to cognitive loyalty while the ego-defensive function of loyalty is significantly positively related to emotional loyalty. The results for the value-expressive function of loyalty were nonsignificant.

Speech endpoint detection using gradient based edge detection techniques

Relevância:

100.00% 100.00%

Publicador:

Open content creation : the issues of voice and the challenges of listening

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper examines the proposition that increased ability to have a voice and be listened to, through ‘open ICT4D’ and ‘open content creation’ can be an effective mechanism for development. The paper discusses empirical work that strongly indicates that this only happens when voice is appropriately valued in the development process. Having a voice in development processes are less effective when participation is limited. Open ICT allows for more and more voices to be heard, but it is open ICT4D that has the obligation to ensure voices are listened to. In the paper I first explore participatory development and the idea of open ICT4D before elaborating on issues of voice and thinking about voice as process, and voice as value. Research findings are presented from research that experimented with participatory (or open) content creation, discussed in relation to notions of openness and voice. I then consider the challenges of listening, before drawing some conclusions about opening up ICT4D research.

The effect of prolonged reading on visual functions and reading performance in students with low vision

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study is the first to investigate the effect of prolonged reading on reading performance and visual functions in students with low vision. The study focuses on one of the most common modes of achieving adequate magnification for reading by students with low vision, their close reading distance (proximal or relative distance magnification). Close reading distances impose high demands on near visual functions, such as accommodation and convergence. Previous research on accommodation in children with low vision shows that their accommodative responses are reduced compared to normal vision. In addition, there is an increased lag of accommodation for higher stimulus levels as may occur at close reading distance. Reduced accommodative responses in low vision and higher lag of accommodation at close reading distances together could impact on reading performance of students with low vision especially during prolonged reading tasks. The presence of convergence anomalies could further affect reading performance. Therefore, the aims of the present study were 1) To investigate the effect of prolonged reading on reading performance in students with low vision 2) To investigate the effect of prolonged reading on visual functions in students with low vision. This study was conducted as cross-sectional research on 42 students with low vision and a comparison group of 20 students with normal vision, aged 7 to 20 years. The students with low vision had vision impairments arising from a range of causes and represented a typical group of students with low vision, with no significant developmental delays, attending school in Brisbane, Australia. All participants underwent a battery of clinical tests before and after a prolonged reading task. An initial reading-specific history and pre-task measurements that included Bailey-Lovie distance and near visual acuities, Pelli-Robson contrast sensitivity, ocular deviations, sensory fusion, ocular motility, near point of accommodation (pull-away method), accuracy of accommodation (Monocular Estimation Method (MEM)) retinoscopy and Near Point of Convergence (NPC) (push-up method) were recorded for all participants. Reading performance measures were Maximum Oral Reading Rates (MORR), Near Text Visual Acuity (NTVA) and acuity reserves using Bailey-Lovie text charts. Symptoms of visual fatigue were assessed using the Convergence Insufficiency Symptom Survey (CISS) for all participants. Pre-task measurements of reading performance and accuracy of accommodation and NPC were compared with post-task measurements, to test for any effects of prolonged reading. The prolonged reading task involved reading a storybook silently for at least 30 minutes. The task was controlled for print size, contrast, difficulty level and content of the reading material. Silent Reading Rate (SRR) was recorded every 2 minutes during prolonged reading. Symptom scores and visual fatigue scores were also obtained for all participants. A visual fatigue analogue scale (VAS) was used to assess visual fatigue during the task, once at the beginning, once at the middle and once at the end of the task. In addition to the subjective assessments of visual fatigue, tonic accommodation was monitored using a photorefractor (PlusoptiX CR03™) every 6 minutes during the task, as an objective assessment of visual fatigue. Reading measures were done at the habitual reading distance of students with low vision and at 25 cms for students with normal vision. The initial history showed that the students with low vision read for significantly shorter periods at home compared to the students with normal vision. The working distances of participants with low vision ranged from 3-25 cms and half of them were not using any optical devices for magnification. Nearly half of the participants with low vision were able to resolve 8-point print (1M) at 25 cms. Half of the participants in the low vision group had ocular deviations and suppression at near. Reading rates were significantly reduced in students with low vision compared to those of students with normal vision. In addition, there were a significantly larger number of participants in the low vision group who could not sustain the 30-minute task compared to the normal vision group. However, there were no significant changes in reading rates during or following prolonged reading in either the low vision or normal vision groups. Individual changes in reading rates were independent of their baseline reading rates, indicating that the changes in reading rates during prolonged reading cannot be predicted from a typical clinical assessment of reading using brief reading tasks. Contrary to previous reports the silent reading rates of the students with low vision were significantly lower than their oral reading rates, although oral and silent reading was assessed using different methods. Although the visual acuity, contrast sensitivity, near point of convergence and accuracy of accommodation were significantly poorer for the low vision group compared to those of the normal vision group, there were no significant changes in any of these visual functions following prolonged reading in either group. Interestingly, a few students with low vision (n =10) were found to be reading at a distance closer than their near point of accommodation. This suggests a decreased sensitivity to blur. Further evaluation revealed that the equivalent intrinsic refractive errors (an estimate of the spherical dioptirc defocus which would be expected to yield a patient’s visual acuity in normal subjects) were significantly larger for the low vision group compared to those of the normal vision group. As expected, accommodative responses were significantly reduced for the low vision group compared to the expected norms, which is consistent with their close reading distances, reduced visual acuity and contrast sensitivity. For those in the low vision group who had an accommodative error exceeding their equivalent intrinsic refractive errors, a significant decrease in MORR was found following prolonged reading. The silent reading rates however were not significantly affected by accommodative errors in the present study. Suppression also had a significant impact on the changes in reading rates during prolonged reading. The participants who did not have suppression at near showed significant decreases in silent reading rates during and following prolonged reading. This impact of binocular vision at near on prolonged reading was possibly due to the high demands on convergence. The significant predictors of MORR in the low vision group were age, NTVA, reading interest and reading comprehension, accounting for 61.7% of the variances in MORR. SRR was not significantly influenced by any factors, except for the duration of the reading task sustained; participants with higher reading rates were able to sustain a longer reading duration. In students with normal vision, age was the only predictor of MORR. Participants with low vision also reported significantly greater visual fatigue compared to the normal vision group. Measures of tonic accommodation however were little influenced by visual fatigue in the present study. Visual fatigue analogue scores were found to be significantly associated with reading rates in students with low vision and normal vision. However, the patterns of association between visual fatigue and reading rates were different for SRR and MORR. The participants with low vision with higher symptom scores had lower SRRs and participants with higher visual fatigue had lower MORRs. As hypothesized, visual functions such as accuracy of accommodation and convergence did have an impact on prolonged reading in students with low vision, for students whose accommodative errors were greater than their equivalent intrinsic refractive errors, and for those who did not suppress one eye. Those students with low vision who have accommodative errors higher than their equivalent intrinsic refractive errors might significantly benefit from reading glasses. Similarly, considering prisms or occlusion for those without suppression might reduce the convergence demands in these students while using their close reading distances. The impact of these prescriptions on reading rates, reading interest and visual fatigue is an area of promising future research. Most importantly, it is evident from the present study that a combination of factors such as accommodative errors, near point of convergence and suppression should be considered when prescribing reading devices for students with low vision. Considering these factors would also assist rehabilitation specialists in identifying those students who are likely to experience difficulty in prolonged reading, which is otherwise not reflected during typical clinical reading assessments.

Coding speech signals at very low rates (below 1 kb/s) with high intelligibility

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

The “voice” of VET teachers : teacher dilemmas and their implications for international students, teachers and VET institutions

Relevância:

100.00% 100.00%

Publicador:

Exploring the validity and predictive power of an extended volunteer functions inventory within the context of episodic skilled volunteering by retirees

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The current study examined the structure of the volunteer functions inventory within a sample of older individuals (N = 187). The career items were replaced with items examining the concept of continuity of work, a potentially more useful and relevant concept for this population. Factor analysis supported a four factor solution, with values, social and continuity emerging as single factors and enhancement and protective items loading together on a single factor. Understanding items did not load highly on any factor. The values and continuity functions were the only dimensions to emerge as predictors of intention to volunteer. This research has important implications for understanding the motivation of older adults to engage in contemporary volunteering settings.

Small footprint implementation of dual-microphone delay-and-sum beamforming for in-car speech enhancement

Relevância:

100.00% 100.00%

Publicador:

Multilingualism and speech-language competence in early childhood: Impact on academic and social-emotional outcomes at school

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This large-scale longitudinal population study provided a rare opportunity to consider the interface between multilingualism and speech-language competence on children’s academic and social-emotional outcomes and to determine whether differences between groups at 4 to 5 years persist, deepen, or disappear with time and schooling. Four distinct groups were identified from the Kindergarten cohort of the Longitudinal Study of Australian Children (LSAC) (1) English-only + typical speech and language (n = 2,012); (2) multilingual + typical speech and language (n = 476); (3) English-only + speech and language concern (n = 643); and (4) multilingual + speech and language concern (n = 109). Two analytic approaches were used to compare these groups. First, a matched case-control design was used to randomly match multilingual children with speech and language concern (group 4, n = 109) to children in groups 1, 2, and 3 on gender, age, and family socio-economic position in a cross-sectional comparison of vocabulary, school readiness, and behavioral adjustment. Next, analyses were applied to the whole sample to determine longitudinal effects of group membership on teachers’ ratings of literacy, numeracy, and behavioral adjustment at ages 6 to 7 and 8 to 9 years. At 4 to 5 years, multilingual children with speech and language concern did equally well or better than English-only children (with or without speech and language concern) on school readiness tests but performed more poorly on measures of English vocabulary and behavior. At ages 6 to 7 and 8 to 9, the early gap between English-only and multilingual children had closed. Multilingualism was not found to contribute to differences in literacy and numeracy outcomes at school; instead, outcomes were more related to concerns about children’s speech and language in early childhood. There were no group differences for socio-emotional outcomes. Early evidence for the combined risks of multilingualism plus speech and language concern was not upheld into the school years.

«
1
2
3
4
5
6
»