966 resultados para Speech data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Recent initiatives within an Australia public healthcare service have seen a focus on increasing the research capacity of their workforce. One of the key initiatives involves encouraging clinicians to be research generators rather than solely research consumers. As a result, baseline data of current research capacity are essential to determine whether initiatives encouraging clinicians to undertake research have been effective. Speech pathologists have previously been shown to be interested in conducting research within their clinical role; therefore they are well positioned to benefit from such initiatives. The present study examined the current research interest, confidence and experience of speech language pathologists (SLPs) in a public healthcare workforce, as well as factors that predicted clinician research engagement. Methods Data were collected via an online survey emailed to an estimated 330 SLPs working within Queensland, Australia. The survey consisted of 30 questions relating to current levels of interest, confidence and experience performing specific research tasks, as well as how frequently SLPs had performed these tasks in the last 5 years. Results Although 158 SLPs responded to the survey, complete data were available for only 137. Respondents were more confident and experienced with basic research tasks (e.g., finding literature) and less confident and experienced with complex research tasks (e.g., analysing and interpreting results, publishing results). For most tasks, SLPs displayed higher levels of interest in the task than confidence and experience. Research engagement was predicted by highest qualification obtained, current job classification level and overall interest in research. Conclusions Respondents generally reported levels of interest in research higher than their confidence and experience, with many respondents reporting limited experience in most research tasks. Therefore SLPs have potential to benefit from research capacity building activities to increase their research skills in order to meet organisational research engagement objectives. However, these findings must be interpreted with the caveats that a relatively low response rate occurred and participants were recruited from a single state-wide health service, and therefore may not be representative of the wider SLP workforce.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Design process phases of development, evaluation and implementation were used to create a garment to simultaneously collect reliable data of speech production and intensity of movement of toddlers (18-36 months). A series of prototypes were developed and evaluated that housed accelerometer-based motion sensors and a digital transmitter with microphone. The approved test garment was a top constructed from loop-faced fabric with interior pockets to house devices. Extended side panels allowed for sizing. In total, 56 toddlers (28 male; 28 female; 16-36 months of age) participated in the study providing pilot and baseline data. The test garment was effective in collecting data as evaluated for accuracy and reliability using ANOVA for accelerometer data, transcription of video for type of movement, and number and length of utterances for speech production. The data collection garment has been implemented in various studies across disciplines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyzes the limitations upon the amount of in- domain (NIST SREs) data required for training a probabilistic linear discriminant analysis (PLDA) speaker verification system based on out-domain (Switchboard) total variability subspaces. By limiting the number of speakers, the number of sessions per speaker and the length of active speech per session available in the target domain for PLDA training, we investigated the relative effect of these three parameters on PLDA speaker verification performance in the NIST 2008 and NIST 2010 speaker recognition evaluation datasets. Experimental results indicate that while these parameters depend highly on each other, to beat out-domain PLDA training, more than 10 seconds of active speech should be available for at least 4 sessions/speaker for a minimum of 800 speakers. If further data is available, considerable improvement can be made over solely out-domain PLDA training.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes the addition of a weighted median Fisher discriminator (WMFD) projection prior to length-normalised Gaussian probabilistic linear discriminant analysis (GPLDA) modelling in order to compensate the additional session variation. In limited microphone data conditions, a linear-weighted approach is introduced to increase the influence of microphone speech dataset. The linear-weighted WMFD-projected GPLDA system shows improvements in EER and DCF values over the pooled LDA- and WMFD-projected GPLDA systems in inter-view-interview condition as WMFD projection extracts more speaker discriminant information with limited number of sessions/ speaker data, and linear-weighted GPLDA approach estimates reliable model parameters with limited microphone data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, I look into a grammatical phenomenon found among speakers of the Cambridgeshire dialect of English. According to my hypothesis, the phenomenon is a new entry into the past BE verb paradigm in the English language. In my paper, I claim that the structure I have found complements the existing two verb forms, was and were, with a third verb form that I have labelled ‘intermediate past BE’. The paper is divided into two parts. In the first section, I introduce the theoretical ground for the study of variation, which is founded on empiricist principles. In variationist linguistics, the main claim is that heterogeneous language use is structured and ordered. In the last 50 years of history in modern linguistics, this claim is controversial. In the 1960s, the generativist movement spearheaded by Noam Chomsky diverted attention away from grammatical theories that are based on empirical observations. The generativists steered away from language diversity, variation and change in favour of generalisations, abstractions and universalist claims. The theoretical part of my paper goes through the main points of the variationist agenda and concludes that abandoning the concept of language variation in linguistics is harmful for both theory and methodology. In the method part of the paper, I present the Helsinki Archive of Regional English Speech (HARES) corpus. It is an audio archive that contains interviews conducted in England in the 1970s and 1980s. The interviews were done in accordance to methods used generally in traditional dialectology. The informants are mostly elderly male people who have lived in the same region throughout their lives and who have left school at an early age. The interviews are actually conversations: the interviewer allowed the informant to pick the topic of conversation to induce a maximally relaxed and comfortable atmosphere and thus allow the most natural dialect variant to emerge in the informant’s speech. In the paper, the corpus chapter introduces some of the transcription and annotation problems associated with spoken language corpora (especially those containing dialectal speech). Questions surrounding the concept of variation are present in this part of the paper too, as especially transcription work is troubled by the fundamental problem of having to describe the fluctuations of everyday speech in text. In the empirical section of the paper, I use HARES to analyse the speech of four informants, with special focus on the emergence of the intermediate past BE variant. My observations and the subsequent analysis permit me to claim that my hypothesis seems to hold. The intermediate variant occupies almost all contexts where one would expect was or were in the informants’ speech. This means that the new variant is integrated into the speakers’ grammars and exemplifies the kind of variation that is at the heart of this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a method of automated segmentation of speech assuming the signal is continuously time varying rather than the traditional short time stationary model. It has been shown that this representation gives comparable if not marginally better results than the other techniques for automated segmentation. A formulation of the 'Bach' (music semitonal) frequency scale filter-bank is proposed. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks considering this model. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. 'Bach' filters are seen to marginally outperform the other filters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This correspondence describes a method for automated segmentation of speech. The method proposed in this paper uses a specially designed filter-bank called Bach filter-bank which makes use of 'music' related perception criteria. The speech signal is treated as continuously time varying signal as against a short time stationary model. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. The Bach filters are seen to marginally outperform the other filters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes an algorithm for joint data detection and tracking of the dominant singular mode of a time varying channel at the transmitter and receiver of a time division duplex multiple input multiple output beamforming system. The method proposed is a modified expectation maximization algorithm which utilizes an initial estimate to track the dominant modes of the channel at the transmitter and the receiver blindly; and simultaneously detects the un known data. Furthermore, the estimates are constrained to be within a confidence interval of the previous estimate in order to improve the tracking performance and mitigate the effect of error propagation. Monte-Carlo simulation results of the symbol error rate and the mean square inner product between the estimated and the true singular vector are plotted to show the performance benefits offered by the proposed method compared to existing techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America

Relevância:

30.00% 30.00%

Publicador:

Resumo:

USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. (C) 2014 Acoustical Society of America.