998 resultados para Speech segmentation
Resumo:
Market segmentation has received relatively limited attention in social marketing, particularly within the context of changing children’s physical activity behaviour. This is an important area of investigation given growing concern over childhood obesity globally. The present research aims to extend current understanding of the applicability of market segmentation within this context. The results of a two-step cluster analysis on data from 512 respondents of an online survey show three distinct segments of caregivers, each with unique beliefs about their primary school children walking to/from school. The results demonstrate the validity of employing the process of market segmentation within this social context and provide further insights for targeting the identified segments through tailored social marketing programs.
Resumo:
Recent changes in the aviation industry and in the expectations of travellers have begun to alter the way we approach our understanding, and thus the segmentation, of airport passengers. The key to successful segmentation of any population lies in the selection of the criteria on which the partitions are based. Increasingly, the basic criteria used to segment passengers (purpose of trip and frequency of travel) no longer provide adequate insights into the passenger experience. In this paper, we propose a new model for passenger segmentation based on the passenger core value, time. The results are based on qualitative research conducted in-situ at Brisbane International Terminal during 2012-2013. Based on our research, a relationship between time sensitivity and degree of passenger engagement was identified. This relationship was used as the basis for a new passenger segmentation model, namely: Airport Enthusiast (engaged, non time sensitive); Time Filler (non engaged, non time sensitive); Efficiency Lover (non engaged, time sensitive) and Efficient Enthusiast (engaged, time sensitive). The outcomes of this research extend the theoretical knowledge about passenger experience in the terminal environment. These new insights can ultimately be used to optimise the allocation of space for future terminal planning and design.
Resumo:
Objective This study seeks establish whether meaningful subgroups exist within a 14-16 year old adolescent population and if these segments respond differently to the Game On: Know Alcohol (GOKA) intervention, a school-based alcohol social marketing program. Methodology This study is part of a larger cluster randomized controlled evaluation of the Game On: Know Alcohol (GOKA) program implemented in 14 schools in 2013/2014. TwoStep cluster analysis was conducted to segment 2114 high school adolescents (14-16 years old) on the basis of 22 demographic, behavioral and psychographic variables. Program effects on knowledge, attitudes, behavioral intentions, social norms, expectancies and refusal self-efficacy of identified segments was subsequently examined. Results Three segments were identified: (1) Abstainers (2) Bingers (3) Moderate Drinkers. Program effects varied significantly across segments. The strongest positive change effects post participation were observed for the Bingers, while mixed effects were evident for Moderate Drinkers and Abstainers. Conclusions These findings provide preliminary empirical evidence supporting application of social marketing segmentation in alcohol education programs. Development of targeted programs that meet the unique needs of each of the three identified segments is indicated to extend the social marketing footprint in alcohol education.
Resumo:
We present a clustering-only approach to the problem of speaker diarization to eliminate the need for the commonly employed and computationally expensive Viterbi segmentation and realignment stage. We use multiple linear segmentations of a recording and carry out complete-linkage clustering within each segmentation scenario to obtain a set of clustering decisions for each case. We then collect all clustering decisions, across all cases, to compute a pairwise vote between the segments and conduct complete-linkage clustering to cluster them at a resolution equal to the minimum segment length used in the linear segmentations. We use our proposed cluster-voting approach to carry out speaker diarization and linking across the SAIVT-BNEWS corpus of Australian broadcast news data. We compare our technique to an equivalent baseline system with Viterbi realignment and show that our approach can outperform the baseline technique with respect to the diarization error rate (DER) and attribution error rate (AER).
Resumo:
We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.
Resumo:
Robust and automatic non-rigid registration depends on many parameters that have not yet been systematically explored. Here we determined how tissue classification influences non-linear fluid registration of brain MRI. Twin data is ideal for studying this question, as volumetric correlations between corresponding brain regions that are under genetic control should be higher in monozygotic twins (MZ) who share 100% of their genes when compared to dizygotic twins (DZ) who share half their genes on average. When these substructure volumes are quantified using tensor-based morphometry, improved registration can be defined based on which method gives higher MZ twin correlations when compared to DZs, as registration errors tend to deplete these correlations. In a study of 92 subjects, higher effect sizes were found in cumulative distribution functions derived from statistical maps when performing tissue classification before fluid registration, versus fluidly registering the raw images. This gives empirical evidence in favor of pre-segmenting images for tensor-based morphometry.
Resumo:
For most people, speech production is relatively effortless and error-free. Yet it has long been recognized that we need some type of control over what we are currently saying and what we plan to say. Precisely how we monitor our internal and external speech has been a topic of research interest for several decades. The predominant approach in psycholinguistics has assumed monitoring of both is accomplished via systems responsible for comprehending others' speech. This special topic aimed to broaden the field, firstly by examining proposals that speech production might also engage more general systems, such as those involved in action monitoring. A second aim was to examine proposals for a production-specific, internal monitor. Both aims require that we also specify the nature of the representations subject to monitoring.
Resumo:
In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.
Resumo:
This large-scale longitudinal population study provided a rare opportunity to consider the interface between multilingualism and speech-language competence on children’s academic and social-emotional outcomes and to determine whether differences between groups at 4 to 5 years persist, deepen, or disappear with time and schooling. Four distinct groups were identified from the Kindergarten cohort of the Longitudinal Study of Australian Children (LSAC) (1) English-only + typical speech and language (n = 2,012); (2) multilingual + typical speech and language (n = 476); (3) English-only + speech and language concern (n = 643); and (4) multilingual + speech and language concern (n = 109). Two analytic approaches were used to compare these groups. First, a matched case-control design was used to randomly match multilingual children with speech and language concern (group 4, n = 109) to children in groups 1, 2, and 3 on gender, age, and family socio-economic position in a cross-sectional comparison of vocabulary, school readiness, and behavioral adjustment. Next, analyses were applied to the whole sample to determine longitudinal effects of group membership on teachers’ ratings of literacy, numeracy, and behavioral adjustment at ages 6 to 7 and 8 to 9 years. At 4 to 5 years, multilingual children with speech and language concern did equally well or better than English-only children (with or without speech and language concern) on school readiness tests but performed more poorly on measures of English vocabulary and behavior. At ages 6 to 7 and 8 to 9, the early gap between English-only and multilingual children had closed. Multilingualism was not found to contribute to differences in literacy and numeracy outcomes at school; instead, outcomes were more related to concerns about children’s speech and language in early childhood. There were no group differences for socio-emotional outcomes. Early evidence for the combined risks of multilingualism plus speech and language concern was not upheld into the school years.
Resumo:
Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.
Resumo:
The common focus of the studies brought together in this work is the prosodic segmentation of spontaneous speech. The theoretically most central aspect is the introduction and further development of the IJ-model of intonational chunking. The study consists of a general introduction and five detailed studies that approach prosodic chunking from different perspectives. The data consist of recordings of face-to-face interaction in several spoken varieties of Finnish and Finland Swedish; the methodology is usage-based and qualitative. The term “speech prosody” refers primarily to the melodic and rhythmic characteristics of speech. Both speaking and understanding speech require the ability to segment the flow of speech into suitably sized prosodic chunks. In order to be usage-based, a study of spontaneous speech consequently needs to be based on material that is segmented into prosodic chunks of various sizes. The segmentation is seen to form a hierarchy of chunking. The prosodic models that have so far been developed and employed in Finland have been based on sentences read aloud, which has made it difficult to apply these models in the analysis of spontaneous speech. The prosodic segmentation of spontaneous speech has not previously been studied in detail in Finland. This research focuses mainly on the following three questions: (1) What are the factors that need to be considered when developing a model of prosodic segmentation of speech, so that the model can be employed regardless of the language or dialect under analysis? (2) What are the characteristics of a prosodic chunk, and what are the similarities in the ways chunks of different languages and varieties manifest themselves that will make it possible to analyze different data according to the same criteria? (3) How does the IJ-model of intonational chunking introduced as a solution to question (1) function in practice in the study of different varieties of Finnish and Finland Swedish? The boundaries of the prosodic chunks were manually marked in the material according to context-specific acoustic and auditory criteria. On the basis of the data analyzed, the IJ-model was further elaborated and implemented, thus allowing comparisons between different language varieties. On the basis of the empirical comparisons, a prosodic typology is presented for the dialects of Swedish in Finland. The general contention is that the principles of the IJ-model can readily be used as a methodological tool for prosodic analysis irrespective of language varieties.
Resumo:
Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.
Resumo:
This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.
Resumo:
We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.
Resumo:
Comprehension of a complex acoustic signal - speech - is vital for human communication, with numerous brain processes required to convert the acoustics into an intelligible message. In four studies in the present thesis, cortical correlates for different stages of speech processing in a mature linguistic system of adults were investigated. In two further studies, developmental aspects of cortical specialisation and its plasticity in adults were examined. In the present studies, electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings of the mismatch negativity (MMN) response elicited by changes in repetitive unattended auditory events and the phonological mismatch negativity (PMN) response elicited by unexpected speech sounds in attended speech inputs served as the main indicators of cortical processes. Changes in speech sounds elicited the MMNm, the magnetic equivalent of the electric MMN, that differed in generator loci and strength from those elicited by comparable changes in non-speech sounds, suggesting intra- and interhemispheric specialisation in the processing of speech and non-speech sounds at an early automatic processing level. This neuronal specialisation for the mother tongue was also reflected in the more efficient formation of stimulus representations in auditory sensory memory for typical native-language speech sounds compared with those formed for unfamiliar, non-prototype speech sounds and simple tones. Further, adding a speech or non-speech sound context to syllable changes was found to modulate the MMNm strength differently in the left and right hemispheres. Following the acoustic-phonetic processing of speech input, phonological effort related to the selection of possible lexical (word) candidates was linked with distinct left-hemisphere neuronal populations. In summary, the results suggest functional specialisation in the neuronal substrates underlying different levels of speech processing. Subsequently, plasticity of the brain's mature linguistic system was investigated in adults, in whom representations for an aurally-mediated communication system, Morse code, were found to develop within the same hemisphere where representations for the native-language speech sounds were already located. Finally, recording and localization of the MMNm response to changes in speech sounds was successfully accomplished in newborn infants, encouraging future MEG investigations on, for example, the state of neuronal specialisation at birth.