967 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs


Relevância:

40.00% 40.00%

Publicador:

Resumo:

El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In distinction to single-stranded anticodons built of G, C, A, and U bases, their presumable double-stranded precursors at the first three positions of the acceptor stem are composed almost invariably of G-C and C-G base pairs. Thus, the “second” operational RNA code responsible for correct aminoacylation seems to be a (G,C) code preceding the classic genetic code. Although historically rooted, the two codes were destined to diverge quite early. However, closer inspection revealed that two complementary catalytic domains of class I and class II aminoacyl-tRNA synthetases (aaRSs) multiplied by two, also complementary, G2-C71 and C2-G71 targets in tRNA acceptors, yield four (2 × 2) different modes of recognition. It appears therefore that the core four-column organization of the genetic code, associated with the most conservative central base of anticodons and codons, was in essence predetermined by these four recognition modes of the (G,C) operational code. The general conclusion follows that the genetic code per se looks like a “frozen accident” but only beyond the “2 × 2 = 4” scope. The four primordial modes of tRNA–aaRS recognition are amenable to direct experimental verification.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Witnessing an ingroup member acting against his or her belief can lead individuals who identify with that group to change their own attitude in the direction of that counterattitudinal behavior. Two studies demonstrate this vicarious dissonance effect among high ingroup identifiers and show that this attitude change is not attributable to conformity to a perceived change in speaker attitude. Study I shows that the effect occurs-indeed, is stronger-even when it is clear that the speaker disagrees with the position espoused, and Study 2 shows that foreseeable aversive consequences bring about attitude change in the observer without any parallel impact on the perceived attitude of the speaker. Furthermore, the assumption that vicarious dissonance is at heart a group phenomenon is supported by the results indicating that attitude change is not impacted either by individual differences in dispositional empathy or measures of interpersonal affinity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper uses a feminist post-structuralist approach to examine the gendered identities of a sample of British business leaders in Britain. While recent national surveys offer many material reasons why women are acutely under-represented as business leaders, the role of language is rarely addressed. This paper explores the ways in which ten senior women and men construct their sense of leadership identities through the medium of interview narratives. Drawing upon two poststructuralist models of analysis (Derrida’s 1987 theory of deconstruction and Bakhtin’s 1927/1981 concept of double-voiced discourse), the paper shows how both females and males are able to shift pragmatically between interwoven corporate discourses, which demand competing cultural allegiances from one moment to the next, allegiances constantly tested by the rapid change and uncertainty that characterise global business. While male leaders experience a relative freedom of movement between different cultural discourses, female leaders are circumscribed by negative and reductive representations of female speech and behaviour. In sum, senior women are required constantly to observe, review, police and repair their use of leadership language, which potentially undermines their confidence and authority as leaders.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The present thesis investigates mode related aspects in biology lecture discourse and attempts to identify the position of this variety along the spontaneous spoken versus planned written language continuum. Nine lectures (of 43,000 words) consisting of three sets of three lectures each, given by the three lecturers at Aston University, make up the corpus. The indeterminacy of the results obtained from the investigation of grammatical complexity as measured in subordination motivates the need to take the analysis beyond sentence level to the study of mode related aspects in the use of sentence-initial connectives, sub-topic shifting and paraphrase. It is found that biology lecture discourse combines features typical of speech and writing at sentence as well as discourse level: thus, subordination is more used than co-ordination, but one degree complexity sentence is favoured; some sentence initial connectives are only found in uses typical of spoken language but sub-topic shift signalling (generally introduced by a connective) typical of planned written language is a major feature of the lectures; syntactic and lexical revision and repetition, interrupted structures are found in the sub-topic shift signalling utterance and paraphrase, but the text is also amenable to analysis into sentence like units. On the other hand, it is also found that: (1) while there are some differences in the use of a given feature, inter-speaker variation is on the whole not significant; (2) mode related aspects are often motivated by the didactic function of the variety; and (3) the structuring of the text follows a sequencing whose boundaries are marked by sub-topic shifting and the summary paraphrase. This study enables us to draw four theoretical conclusions: (1) mode related aspects cannot be approached as a simple dichotomy since a combination of aspects of both speech and writing are found in a given feature. It is necessary to go to the level of textual features to identify mode related aspects; (2) homogeneity is dominant in this sample of lectures which suggests that there is a high level of standardization in this variety; (3) the didactic function of the variety is manifested in some mode related aspects; (4) the features studied play a role in the structuring of the text.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Master of Arts dissertation

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article investigates potential effects which (the recontextualisation of) interpreted discourse can have on the positioning of participants. The discursive event which forms the basis of the analysis are international press conferences which bring politicians and journalists together. The dominant question addressed is: (How) do interpreter-mediated encounters influence the positioning of participants and thus the construction of interactional and social roles? The article illustrates that methods of (critical) discourse analysis can be used to identify positioning strategies which are employed by participants in such triadic exchanges. The data come from press conferences which involve English, German, and French as source and target languages.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: (1) error rate on testing set, (2) processing time needed to recognize a segmented character and (3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method, which is different from the conventional adaptive segmentation methods. Dictionary-based correction is utilized to correct mistakes resulting from the recognition and segmentation phases. The integration of the segmentation methods with the handwritten character recognition algorithm yielded an accuracy of 92% for lower case characters and 97% for upper case characters. In the testing phase, the database consists of 20,000 handwritten characters, with 10,000 for each case. The testing phase on the recognition 10,000 handwritten characters required 8.5 seconds in processing time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Hardware/software (HW/SW) cosimulation integrates software simulation and hardware simulation simultaneously. Usually, HW/SW co-simulation platform is used to ease debugging and verification for very large-scale integration (VLSI) design. To accelerate the computation of the gesture recognition technique, an HW/SW implementation using field programmable gate array (FPGA) technology is presented in this paper. The major contributions of this work are: (1) a novel design of memory controller in the Verilog Hardware Description Language (Verilog HDL) to reduce memory consumption and load on the processor. (2) The testing part of the neural network algorithm is being hardwired to improve the speed and performance. The American Sign Language gesture recognition is chosen to verify the performance of the approach. Several experiments were carried out on four databases of the gestures (alphabet signs A to Z). (3) The major benefit of this design is that it takes only few milliseconds to recognize the hand gesture which makes it computationally more efficient.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

General note: Title and date provided by Bettye Lane.