918 resultados para Automatic Speaker Recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work focuses on Machine Translation (MT) and Speech-to-Speech Translation, two emerging technologies that allow users to automatically translate written and spoken texts. The first part of this work provides a theoretical framework for the evaluation of Google Translate and Microsoft Translator, which is at the core of this study. Chapter one focuses on Machine Translation, providing a definition of this technology and glimpses of its history. In this chapter we will also learn how MT works, who uses it, for what purpose, what its pros and cons are, and how machine translation quality can be defined and assessed. Chapter two deals with Speech-to-Speech Translation by focusing on its history, characteristics and operation, potential uses and limits deriving from the intrinsic difficulty of translating spoken language. After describing the future prospects for SST, the final part of this chapter focuses on the quality assessment of Speech-to-Speech Translation applications. The last part of this dissertation describes the evaluation test carried out on Google Translate and Microsoft Translator, two mobile translation apps also providing a Speech-to-Speech Translation service. Chapter three illustrates the objectives, the research questions, the participants, the methodology and the elaboration of the questionnaires used to collect data. The collected data and the results of the evaluation of the automatic speech recognition subsystem and the language translation subsystem are presented in chapter four and finally analysed and compared in chapter five, which provides a general description of the performance of the evaluated apps and possible explanations for each set of results. In the final part of this work suggestions are made for future research and reflections on the usability and usefulness of the evaluated translation apps are provided.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Both attentional difficulties and rapid processing deficits have recently been linked with dyslexia. We report two studies comparing the performance of dyslexic and control teenagers on attentional tasks. The two studies were based on two different conceptions of attention. Study 1 employed a design that allowed three key components of attention - focusing, switching, and sustaining - to be investigated separately. One hypothesis under investigation was that rapid processing problems - in particular impaired ability to switch attention rapidly - might be associated with dyslexia. However, although dyslexic participants were significantly less accurate than their controls in a condition where they had to switch attention between two target types, the nature of the deficit suggested that the problem was not in switching attention per se. Thus, in Study 2, we explored an alternative interpretation of the Study 1 results in terms of the classic capacity-limited models of "central" attention. We contrasted two hypotheses: (1) that dyslexic teenagers have reduced cognitive resources versus (2) that they suffer from a general impairment in the ability to automatise basic skills. To investigate the automaticity of the shape recognition component of the task a similar attention paradigm to that used in Study 1 was employed, but using degraded, as well as intact, stimuli. It was found that stimulus degradation led to relatively less impairment for dyslexic than for matched control groups. The results support the hypothesis that dyslexic people suffer from a general impairment in the ability to automatise skills - in this case the skill of automatic shape recognition.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Negli ultimi anni, l'avanzamento incredibilmente rapido della tecnologia ha portato allo sviluppo e alla diffusione di dispositivi elettronici portatili aventi dimensioni estremamente ridotte e, allo stesso tempo, capacità computazionali molto notevoli. Più nello specifico, una particolare categoria di dispositivi, attualmente in forte sviluppo, che ha già fatto la propria comparsa sul mercato mondiale è sicuramente la categoria dei dispositivi Wearable. Come suggerisce il nome, questi sono progettati per essere letteralmente indossati, pensati per fornire continuo supporto, in diversi ambiti, a chi li utilizza. Se per interagire con essi l’utente non deve ricorrere obbligatoriamente all'utilizzo delle mani, allora si parla di dispositivi Wearable Hands Free. Questi sono generalmente in grado di percepire e catture l’input dell'utente seguendo tecniche e metodologie diverse, non basate sul tatto. Una di queste è sicuramente quella che prevede di modellare l’input dell’utente stesso attraverso la sua voce, appoggiandosi alla disciplina dell’ASR (Automatic Speech Recognition), che si occupa della traduzione del linguaggio parlato in testo, mediante l’utilizzo di dispositivi computerizzati. Si giunge quindi all’obiettivo della tesi, che è quello di sviluppare un framework, utilizzabile nell’ambito dei dispositivi Wearable, che fornisca un servizio di riconoscimento vocale appoggiandosi ad uno già esistente, in modo che presenti un certo livello di efficienza e facilità di utilizzo. Più in generale, in questo documento si punta a fornire una descrizione approfondita di quelli che sono i dispositivi Wearable e Wearable Hands-Free, definendone caratteristiche, criticità e ambiti di utilizzo. Inoltre, l’intento è quello di illustrare i principi di funzionamento dell’Automatic Speech Recognition per passare poi ad analisi, progettazione e sviluppo del framework appena citato.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This presentation summarizes experience with the automated speech recognition and translation approach realised in the context of the European project EMMA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis examines the state of audiovisual translation (AVT) in the aftermath of the COVID-19 emergency, highlighting new trends with regards to the implementation of AI technologies as well as their strengths, constraints, and ethical implications. It starts with an overview of the current AVT landscape, focusing on future projections about its evolution and its critical aspects such as the worsening working conditions lamented by AVT professionals – especially freelancers – in recent years and how they might be affected by the advent of AI technologies in the industry. The second chapter delves into the history and development of three AI technologies which are used in combination with neural machine translation in automatic AVT tools: automatic speech recognition, speech synthesis and deepfakes (voice cloning and visual deepfakes for lip syncing), including real examples of start-up companies that utilize them – or are planning to do so – to localize audiovisual content automatically or semi-automatically. The third chapter explores the many ethical concerns around these innovative technologies, which extend far beyond the field of translation; at the same time, it attempts to revindicate their potential to bring about immense progress in terms of accessibility and international cooperation, provided that their use is properly regulated. Lastly, the fourth chapter describes two experiments, testing the efficacy of the currently available tools for automatic subtitling and automatic dubbing respectively, in order to take a closer look at their perks and limitations compared to more traditional approaches. This analysis aims to help discerning legitimate concerns from unfounded speculations with regards to the AI technologies which are entering the field of AVT; the intention behind it is to humbly suggest a constructive and optimistic view of the technological transformations that appear to be underway, whilst also acknowledging their potential risks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This project was funded under the Applied Research Grants Scheme administered by Enterprise Ireland. The project was a partnership between Galway - Mayo Institute of Technology and an industrial company, Tyco/Mallinckrodt Galway. The project aimed to develop a semi - automatic, self - learning pattern recognition system capable of detecting defects on the printed circuits boards such as component vacancy, component misalignment, component orientation, component error, and component weld. The research was conducted in three directions: image acquisition, image filtering/recognition and software development. Image acquisition studied the process of forming and digitizing images and some fundamental aspects regarding the human visual perception. The importance of choosing the right camera and illumination system for a certain type of problem has been highlighted. Probably the most important step towards image recognition is image filtering, The filters are used to correct and enhance images in order to prepare them for recognition. Convolution, histogram equalisation, filters based on Boolean mathematics, noise reduction, edge detection, geometrical filters, cross-correlation filters and image compression are some examples of the filters that have been studied and successfully implemented in the software application. The software application developed during the research is customized in order to meet the requirements of the industrial partner. The application is able to analyze pictures, perform the filtering, build libraries, process images and generate log files. It incorporates most of the filters studied and together with the illumination system and the camera it provides a fully integrated framework able to analyze defects on printed circuit boards.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2013

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A significant part of daily energy expenditure may be attributed to non-exercise activity thermogenesis and exercise activity thermogenesis. Automatic recognition of postural allocations such as standing or sitting can be used in behavioral modification programs aimed at minimizing static postures. In this paper we propose a shoe-based device and related pattern recognition methodology for recognition of postural allocations. Inexpensive technology allows implementation of this methodology as a part of footwear. The experimental results suggest high efficiency and reliability of the proposed approach.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work we present a simulation of a recognition process with perimeter characterization of a simple plant leaves as a unique discriminating parameter. Data coding allowing for independence of leaves size and orientation may penalize performance recognition for some varieties. Border description sequences are then used, and Principal Component Analysis (PCA) is applied in order to study which is the best number of components for the classification task, implemented by means of a Support Vector Machine (SVM) System. Obtained results are satisfactory, and compared with [4] our system improves the recognition success, diminishing the variance at the same time.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this work we present a simulation of a recognition process with perimeter characterization of a simple plant leaves as a unique discriminating parameter. Data coding allowing for independence of leaves size and orientation may penalize performance recognition for some varieties. Border description sequences are then used to characterize the leaves. Independent Component Analysis (ICA) is then applied in order to study which is the best number of components to be considered for the classification task, implemented by means of an Artificial Neural Network (ANN). Obtained results with ICA as a pre-processing tool are satisfactory, and compared with some references our system improves the recognition success up to 80.8% depending on the number of considered independent components.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The problem of automatic recognition of the fish from the video sequences is discussed in this Master’s Thesis. This is a very urgent issue for many organizations engaged in fish farming in Finland and Russia because the process of automation control and counting of individual species is turning point in the industry. The difficulties and the specific features of the problem have been identified in order to find a solution and propose some recommendations for the components of the automated fish recognition system. Methods such as background subtraction, Kalman filtering and Viola-Jones method were implemented during this work for detection, tracking and estimation of fish parameters. Both the results of the experiments and the choice of the appropriate methods strongly depend on the quality and the type of a video which is used as an input data. Practical experiments have demonstrated that not all methods can produce good results for real data, whereas on synthetic data they operate satisfactorily.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.