818 resultados para speaker diarization
Resumo:
This paper examines the issue of face, speaker and bi-modal authentication in mobile environments when there is significant condition mismatch. We introduce this mismatch by enrolling client models on high quality biometric samples obtained on a laptop computer and authenticating them on lower quality biometric samples acquired with a mobile phone. To perform these experiments we develop three novel authentication protocols for the large publicly available MOBIO database. We evaluate state-of-the-art face, speaker and bi-modal authentication techniques and show that inter-session variability modelling using Gaussian mixture models provides a consistently robust system for face, speaker and bi-modal authentication. It is also shown that multi-algorithm fusion provides a consistent performance improvement for face, speaker and bi-modal authentication. Using this bi-modal multi-algorithm system we derive a state-of-the-art authentication system that obtains a half total error rate of 6.3% and 1.9% for Female and Male trials, respectively.
Resumo:
This paper analyzes the limitations upon the amount of in- domain (NIST SREs) data required for training a probabilistic linear discriminant analysis (PLDA) speaker verification system based on out-domain (Switchboard) total variability subspaces. By limiting the number of speakers, the number of sessions per speaker and the length of active speech per session available in the target domain for PLDA training, we investigated the relative effect of these three parameters on PLDA speaker verification performance in the NIST 2008 and NIST 2010 speaker recognition evaluation datasets. Experimental results indicate that while these parameters depend highly on each other, to beat out-domain PLDA training, more than 10 seconds of active speech should be available for at least 4 sessions/speaker for a minimum of 800 speakers. If further data is available, considerable improvement can be made over solely out-domain PLDA training.
Resumo:
The increasing linguistic and cultural diversity of our contemporary world points to the salience of maintaining and developing Heritage Language of ethnic minority groups. The mutually constitutive effect between Heritage Language learning and ethnic identity construction has been well documented in the literature. Classical social psychological work often quantitatively structures this phenomenon in a predictable linear relationship. In contrast, poststructural scholarship draws on qualitative approaches to claim the malleable and multiple dynamics behind the phenomenon. The two schools oppose but complement each other. Nevertheless, both schools struggle to capture the detailed and nuanced construction of ethnic identity through Heritage Language learning. Different from the extant research, we make an attempt to ethno-methodologically unearth the nuisances and predicaments embedded in the reflexive, subtle, and multi-layered identity constructions through nuanced, inter-nested language practices. Drawing on data from the qualitative phase of a large project, we highlight some small but powerful moments abstracted from the interview accounts of five Chinese Australian young people. Firstly, we zoom in on the life politics behind the ‘seen but unnoticed’ stereotype that looking Chinese means being able to speak Chinese. Secondly, we speculate the power relations between the speaker and the listener through the momentary and inadvertent breaches of the taken-for-granted stereotype. Next, we unveil how learning Chinese has become an accountably rational priority for these young Chinese Australians. Finally, we argue that the normalised stereotype becomes visible and hence stable when it is breached – a practical accomplishment that we term ‘habitus realisation’.
Resumo:
Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.
Resumo:
Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.
Resumo:
Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.
Resumo:
This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.
Resumo:
Abstract-The success of automatic speaker recognition in laboratory environments suggests applications in forensic science for establishing the Identity of individuals on the basis of features extracted from speech. A theoretical model for such a verification scheme for continuous normaliy distributed featureIss developed. The three cases of using a) single feature, b)multipliendependent measurements of a single feature, and c)multpleindependent features are explored.The number iofndependent features needed for areliable personal identification is computed based on the theoretcal model and an expklatory study of some speech featues.
Resumo:
The work integrates research in the language and terminology of various fields with lexicography, etymology, semantics, word formation, and pragmatics. Additionally, examination of German and Finnish provides the work with perspective of contrastive linguistics and the translation of texts in specialized fields. The work is an attempt to chart the language, vocabulary, different textual types, and essential communication-connected features of this special field. The study is primary concerned with internal communication within the field of ecology, but it also provides a comparison of the public discussion of environmental issues in Germany and Finland. The work attempts to use textual signs to provide a picture of the literary communication used on the different vertical levels in the central text types within the field. The dictionaries in the fields of environmental issues and ecology for the individual text types are examined primarily from the perspective of their quantity and diversity. One central point of the work is to clarify and collect all of the dictionaries in the field that have been compiled thus far in which German and/or Finnish ware included. Ecology and environmental protection are closely linked not only to each other but also to many other scientific fields. Consequently, the language of the environmental field has acquired an abundance of influences and vocabulary from the language of the special fields close to it as well as from that of politics and various areas of public administration. The work also demonstrates how the popularization of environmental terminology often leads to semantic distortion. Traditionally, scientific texts have used the smallest number of expressions, the purpose of which is to appeal to or influence the behavior of the text recipient. Particularly in Germany, those who support or oppose measures to protect the environment have long been making concerted efforts to represent their own views in the language that they use. When discussing controversial issues competing designations for the same referent or concept are used in accordance with the interest group to which the speaker belongs. One of the objectives of the study is to sensitize recipients of texts to notice the euphemistic expressions that occur in German and Finnish texts dealing with issues that are sensitive from the standpoint of environmental policy. One particular feature of the field is the wealth and large number of variants designating the same entry or concept. The terminological doublets formed by words of foreign origin and their German or Finnish language equivalents are quite typical of the field. Methods of corpus linguistics are used to determine the reasons for the large number of variant designations as well as their functionality.
Resumo:
The study investigates actions by recipients in spontaneous Russian conversations by focusing on DA, NU and TAK, when they are used as responses to the main speaker's larger on-going turn. The database for the study consists of some 7 hours of spontaneous conversations. The participants of the conversations come from different parts of Russia. The use of DA, NU and TAK was analyzed by applying the method of ethnomethodological conversation analysis from the point of view of the type of the context, the sequential placement of the response and its manner of production. The particles were analyzed both in contexts in which they responded to an informing and in affective contexts. The particles NU and TAK were used by the speakers almost exclusively in informing contexts, whereas DA was the central response type in affective contexts. DA was also the most common response to information with affective implications. The information, to which the particle NU provided as response, was often unspesific and projected a spesification or explanation by its speaker as the next action. DA and TAK, by contrast, treated the information as one that could be followed and was sufficient in its local context. As a response to parenthetical information NU responded to information that was only loosely connected with the mainline of talk. The particle DA, by contrast, was used as a response to such parenthetical information, which was more crucial for the larger on-going activity. Only NU was used as a response that invited the main speaker to continue a turn that she or he had offered as possibly complete. NU was also used by the recipient after her or his own contribution as a continuer. In affective contexts, DA expressed, depending on its more spesific context, not only agreement but also other functions, such as giving up arguing or prior knowledge on the topic being discussed. In addition DA responses were used to display empathy and identification with the state of affairs expressed by the co-participant. NU, by contrast, was seldom used as a response to a turn that expressed affect. When it was used in affective contexts, it displayed agreement with the co-participant or just registered an assessment by her or him.
Resumo:
Expressing generalized-personal meaning in Russian Based on data from Russian, this doctoral dissertation examines generalized-personal meaning that is, generic expressions referring to all human beings, people in general, each or any person (e.g. S vozrastom načinae cenit prostye ve či With age you start to appreciate simple things ). The study shares its basic theoretical orientation with functional approaches going from meaning to form . The objective of the thesis is to determine and describe the various linguistic means which can be used by the speaker to express generalized-personal meaning. The main material of the study consists of 2,000 examples collected from modern Russian literature, newspapers, and magazines. The linguistic means of expressing generalized-personal meaning are divided into three main classes. Morphological and lexico-grammatical means (22% of the material) include the use of personal pronouns and personal verbal endings. In Russian, all personal forms except the 3rd person singular can be used in a generalized-personal meaning. Lexical means (14% of the material) involve, above all, pronouns like vse all , ka dyj everyone , nikto no one , as well as the nouns čelovek man and ljudi people . In emotional speech, generalized-personal meaning can also be conveyed lexically by using utterances like da e idiot znaet even an idiot knows . In rhetorical questions the pronoun kto who can appear in this meaning (cf. Kto ne ljubit moro enoe?! Who doesn t like ice cream?! ). The third main class, syntactic means (64% of the material), consists of constructions in which the generic person is not expressed at the surface level. This class mainly includes two-component structures in which the infinitive relates to a modal predicative adverb (e.g. mo no can, be allowed to , nado must ), modal verb (e.g. stoit be worth(while) , sleduet must, be obliged to ), or predicative adverb ending in -о (e.g. trudno it is hard to , neprilično is not appropriate ). Other syntactic means are: one-component infinitive structures, so-called embedded structures, structures with a processual noun, passive constructions, and gerund constructions. The different forms of expression available in Russian are not interchangeable in all contexts. Even if a given context tolerates the substitution of one construction for another, the two expressions are never entirely synonymous. In addition to determining the range of forms which can express generalized-personal meaning, the study aims to compare these forms and to specify the conditions and possible restrictions (contextual, semantic, syntactic, stylistic, etc.) associated with the use of each construction. In Russian linguistics, the generalized-personal meaning has not been extensively studied from a functional perspective. The advantage of a meaning-based functional approach is that it gives a comprehensive picture of the diversity and distribution of the phenomenon.
Resumo:
In this study I look at what people want to express when they talk about time in Russian and Finnish, and why they use the means they use. The material consists of expressions of time: 1087 from Russian and 1141 from Finnish. They have been collected from dictionaries, usage guides, corpora, and the Internet. An expression means here an idiomatic set of words in a preset form, a collocation or construction. They are studied as lexical entities, without a context, and analysed and categorized according to various features. The theoretical background for the study includes two completely different approaches. Functional Syntax is used in order to find out what general meanings the speaker wishes to convey when talking about time and how these meanings are expressed in specific languages. Conceptual metaphor theory is used for explaining why the expressions are as they are, i.e. what kind of conceptual metaphors (transfers from one conceptual domain to another) they include. The study has resulted in a grammatically glossed list of time expressions in Russian and Finnish, a list of 56 general meanings involved in these time expressions and an account of the means (constructions) that these languages have for expressing the general meanings defined. It also includes an analysis of conceptual metaphors behind the expressions. The general meanings involved turned out to revolve around expressing duration, point in time, period of time, frequency, sequence, passing of time, suitable time and the right time, life as time, limitedness of time, and some other notions having less obvious semantic relations to the others. Conceptual metaphor analysis of the material has shown that time is conceptualized in Russian and Finnish according to the metaphors Time Is Space (Time Is Container, Time Has Direction, Time Is Cycle, and the Time Line Metaphor), Time Is Resource (and its submapping Time Is Substance), Time Is Actor; and some characteristics are added to these conceptualizations with the help of the secondary metaphors Time Is Nature and Time Is Life. The limits between different conceptual metaphors and the connections these metaphors have with one another are looked at with the help of the theory of conceptual integration (the blending theory) and its schemas. The results of the study show that although Russian and Finnish are typologically different, they are very similar both in the needs of expression their speakers have concerning time, and in the conceptualizations behind expressing time. This study introduces both theoretical and methodological novelties in the nature of material used, in developing empirical methodology for conceptual metaphor studies, in the exactness of defining the limits of different conceptual metaphors, and in seeking unity among the different facets of time. Keywords: time, metaphor, time expression, idiom, conceptual metaphor theory, functional syntax, blending theory
Resumo:
This paper focuses on the fundamental right to be heard, that is, the right to have one’s voice heard and listened to – to impose reception (Bourdieu, 1977). It focuses on the ways that non-mainstream English is heard and received in Australia, where despite public policy initiatives around equal opportunity, language continues to socially disadvantage people (Burridge & Mulder, 1998). English is the language of the mainstream and most people are monolingually English (Ozolins, 1993). English has no official status yet it remains dominant and its centrality is rarely challenged (Smolicz, 1995). This paper takes the position that the lack of language engagement in mainstream Australia leads to linguistic desensitisation. Writing in the US context where English is also the unofficial norm, Lippi-Green (1997) maintains that discrimination based on speech features or accent is commonly accepted and widely perceived as appropriate. In Australia, non-standard forms of English are often disparaged or devalued because they do not conform to the ‘standard’ (Burridge & Mulder, 1998). This paper argues that talk cannot be taken for granted: ‘spoken voices’ are critical tools for representing the self and negotiating and manifesting legitimacy within social groups (Miller, 2003). In multicultural, multilingual countries like Australia, the impact of the spoken voice, its message and how it is heard are critical tools for people seeking settlement, inclusion and access to facilities and services. Too often these rights are denied because of the way a person sounds. This paper reports a study conducted with a group that has been particularly vulnerable to ongoing ‘panics’ about language – international students. International education is the third largest revenue source for Australia (AEI, 2010) but has been beset by concerns from academics (Auditor-General, 2002) and the media about student language levels and falling work standards (e.g. Livingstone, 2004). Much of the focus has been high-stakes writing but with the ascendancy of project work in university assessment and the increasing emphasis on oracy, there is a call to recognise the salience of talk, especially among students using English as a second language (ESL) (Kettle & May, 2012). The study investigated the experiences of six international students in a Master of Education course at a large metropolitan university. It utilised data from student interviews, classroom observations, course materials, university policy documents and media reports to examine the ways that speaking and being heard impacted on the students’ learning and legitimacy in the course. The analysis drew on Fairclough’s (2003) model of the dialectical-relational Critical Discourse Analysis (CDA) to analyse the linguistic, discursive and social relations between the data texts and their conditions of production and interpretation, including the wider socio-political discourses on English, language difference, and second language use. The interests of the study were if and how discourses of marginalisation and discrimination manifested and if and how students recognised and responded to them pragmatically. Also how they juxtaposed with and/or contradicted the official rhetoric about diversity and inclusion. The underpinning rationale was that international students’ experiences can provide insights into the hidden politics and practices of being heard and afforded speaking rights as a second language speaker in Australia.
Resumo:
This is the fourth TAProViz workshop being run at the 13th International Conference on Business Process Management (BPM). The intention this year is to consolidate on the results of the previous successful workshops by further developing this important topic, identifying the key research topics of interest to the BPM visualization community. Towards this goal, the workshop topics were extended to human computer interaction and related domains. Submitted papers were evaluated by at least three program committee members, in a double blind manner, on the basis of significance, originality, technical quality and exposition. Three full and one position papers were accepted for presentation at the workshop. In addition, we invited a keynote speaker, Jakob Pinggera, a postdoctoral researcher at the Business Process Management Research Cluster at the University of Innsbruck, Austria.
Resumo:
"We have neither Eternal Friends nor Eternal Enemies. We have only Eternal Interests .Finland's Relations with China 1949-1989 The study focuses on the relations between Finland and the People s Republic of China from 1949-1989 and examines how a small country became embroiled in international politics, and how, at the same time, international politics affected Finnish-Chinese relations and Finland s China policy formulation. The study can be divided into three sections: relations during the early years, 1949-1960, before the Chinese and Soviet rift became public; the relations during the passive period during the 1960s and 1970s; and the impact of China s Open Door policy on Finland s China policy from 1978-1989. The diplomatically challenging events around Tiananmen Square and the reactions which followed in Finland bring the study to a close. Finland was among the first Western countries to recognise the People s Republic and to establish diplomatic relations with her, thereby giving Finland an excellent position from which to further develop good relations. Finland was also the first Western country to sign a trade agreement with China. These two factors meant that Finland was able to enjoy a special status with China during the 1950s. The special status was further strengthened by the systematic support of the government of Finland for China's UN membership. The solid reputation earned in the 1950s had to carry Finland all the way through to the 1980s. For the two decades in between, during the passive policy period of the 1960s and 1970s, relations between Finland and the Soviet Union also determined the state of foreign relations with China. Interestingly, however, it appeared that President Urho Kekkonen was encouraged by Ambassador Joel Toivola to envisage a more proactive policy towards China, but the Cultural Revolution cut short any such plan for nearly twenty years. Because of the Soviet Union, Finland held on to her passive China policy, even though no such message was ever received from the Soviet Union. In fact, closer relationships between Finland and China were encouraged through diplomatic channels. It was not until the presidency of Mauno Koivisto that the first high-level ministerial visit was made to China when, in 1984, Foreign Minister Paavo Väyrynen visited the People s Republic. Finnish-Chinese relations were lifted to a new level. Foreign Minister Väyrynen, however, was forced to remove the prejudices of the Chinese. In 1985, when the Speaker of the Finnish Parliament, Erkki Pystynen visited China he also discovered that Finland s passive China policy had caused misunderstandings amongst the Chinese politicians. The number of exchanges escalated in the wake of the ground-breaking visit by Foreign Minister Väyrynen: Prime Minister Kalevi Sorsa visited China in 1986 and President Koivisto did so in 1988. President Koivisto stuck to practical, China-friendly policies: his correspondence with Li Peng, the attitude taken by the Finnish government after the Tiananmen Square events and the subsequent choices made by his administration all pointed to a new era in relations with China.