Biblioteca Digital

961 resultados para Automatic speech recognition (ASR)

Automatic Target Recognition in Synthetic Aperture Radar Imagery: A State-of-the-Art Review

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this paper is to survey and assess the state-of-the-art in automatic target recognition for synthetic aperture radar imagery (SAR-ATR). The aim is not to develop an exhaustive survey of the voluminous literature, but rather to capture in one place the various approaches for implementing the SAR-ATR system. This paper is meant to be as self-contained as possible, and it approaches the SAR-ATR problem from a holistic end-to-end perspective. A brief overview for the breadth of the SAR-ATR challenges is conducted. This is couched in terms of a single-channel SAR, and it is extendable to multi-channel SAR systems. Stages pertinent to the basic SAR-ATR system structure are defined, and the motivations of the requirements and constraints on the system constituents are addressed. For each stage in the SAR-ATR processing chain, a taxonomization methodology for surveying the numerous methods published in the open literature is proposed. Carefully selected works from the literature are presented under the taxa proposed. Novel comparisons, discussions, and comments are pinpointed throughout this paper. A two-fold benchmarking scheme for evaluating existing SAR-ATR systems and motivating new system designs is proposed. The scheme is applied to the works surveyed in this paper. Finally, a discussion is presented in which various interrelated issues, such as standard operating conditions, extended operating conditions, and target-model design, are addressed. This paper is a contribution toward fulfilling an objective of end-to-end SAR-ATR system design.

Does cochlear implantation improve speech recognition in children with auditory neuropathy spectrum disorder? A systematic review

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: Cochlear implantation (CI) is a standard treatment for severe-profound sensorineural hearing loss (SNHL). However, consensus has yet to be reached on its effectiveness for hearing loss caused by auditory neuropathy spectrum disorder (ANSD). This review aims to summarize and synthesize current evidence of the effectiveness of CI in improving speech recognition in children with ANSD. DESIGN: Systematic review. STUDY SAMPLE: A total of 27 studies from an initial selection of 237. RESULTS: All selected studies were observational in design, including case studies, cohort studies, and comparisons between children with ANSD and SNHL. Most children with ANSD achieved open-set speech recognition with their CI. Speech recognition ability was found to be equivalent in CI users (who previously performed poorly with hearing aids) and hearing-aid users. Outcomes following CI generally appeared similar in children with ANSD and SNHL. Assessment of study quality, however, suggested substantial methodological concerns, particularly in relation to issues of bias and confounding, limiting the robustness of any conclusions around effectiveness. CONCLUSIONS: Currently available evidence is compatible with favourable outcomes from CI in children with ANSD. However, this evidence is weak. Stronger evidence is needed to support cost-effective clinical policy and practice in this area.

Reconhecimento de voz atrav��s de unidades menores do que a palavra, utilizando Wavelet Packet e SVM, em uma nova estrutura hier��rquica de decis��o

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The automatic speech recognition by machine has been the target of researchers in the past five decades. In this period have been numerous advances, such as in the field of recognition of isolated words (commands), which has very high rates of recognition, currently. However, we are still far from developing a system that could have a performance similar to the human being (automatic continuous speech recognition). One of the great challenges of searches for continuous speech recognition is the large amount of pattern. The modern languages such as English, French, Spanish and Portuguese have approximately 500,000 words or patterns to be identified. The purpose of this study is to use smaller units than the word such as phonemes, syllables and difones units as the basis for the speech recognition, aiming to recognize any words without necessarily using them. The main goal is to reduce the restriction imposed by the excessive amount of patterns. In order to validate this proposal, the system was tested in the isolated word recognition in dependent-case. The phonemes characteristics of the Brazil s Portuguese language were used to developed the hierarchy decision system. These decisions are made through the use of neural networks SVM (Support Vector Machines). The main speech features used were obtained from the Wavelet Packet Transform. The descriptors MFCC (Mel-Frequency Cepstral Coefficient) are also used in this work. It was concluded that the method proposed in this work, showed good results in the steps of recognition of vowels, consonants (syllables) and words when compared with other existing methods in literature

Ferramentas e recursos livres para reconhecimento e s��ntese de voz em portugu��s brasileiro

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sistemas de reconhecimento e s��ntese de voz s��o constitu��dos por m��dulos que dependem da l��ngua e, enquanto existem muitos recursos p��blicos para alguns idiomas (p.e. Ingl��s e Japon��s), os recursos para Portugu��s Brasileiro (PB) ainda s��o escassos. Outro aspecto �� que, para um grande n��mero de tarefas, a taxa de erro dos sistemas de reconhecimento de voz atuais ainda �� elevada, quando comparada �� obtida por seres humanos. Assim, apesar do sucesso das cadeias escondidas de Markov (HMM), �� necess��ria a pesquisa por novos m��todos. Este trabalho tem como motiva��o esses dois fatos e se divide em duas partes. A primeira descreve o desenvolvimento de recursos e ferramentas livres para reconhecimento e s��ntese de voz em PB, consistindo de bases de dados de ��udio e texto, um dicion��rio fon��tico, um conversor grafema-fone, um separador sil��bico e modelos ac��stico e de linguagem. Todos os recursos constru��dos encontram-se publicamente dispon��veis e, junto com uma interface de programa��o proposta, t��m sido usados para o desenvolvimento de v��rias novas aplica��es em tempo-real, incluindo um m��dulo de reconhecimento de voz para a su��te de aplicativos para escrit��rio OpenOffice.org. S��o apresentados testes de desempenho dos sistemas desenvolvidos. Os recursos aqui produzidos e disponibilizados facilitam a ado��o da tecnologia de voz para PB por outros grupos de pesquisa, desenvolvedores e pela ind��stria. A segunda parte do trabalho apresenta um novo m��todo para reavaliar (rescoring) o resultado do reconhecimento baseado em HMMs, o qual �� organizado em uma estrutura de dados do tipo lattice. Mais especificamente, o sistema utiliza classificadores discriminativos que buscam diminuir a confus��o entre pares de fones. Para cada um desses problemas bin��rios, s��o usadas t��cnicas de sele��o autom��tica de par��metros para escolher a representa��ao param��trica mais adequada para o problema em quest��o.

Avan��os em reconhecimento de fala para portugu��s brasileiro e aplica��es: ditado no libreoffice e unidade de resposta aud��vel com asterisk

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O reconhecimento autom��tico de voz vem sendo cada vez mais ��til e poss��vel. Quando se trata de l��nguas como a Inglesa, encontram-se no mercado excelentes reconhecedores. Porem, a situa��o n��o e a mesma para o Portugu��s Brasileiro, onde os principais reconhecedores para ditado em sistemas desktop que j�� existiram foram descontinuados. A presente disserta��o alinha-se com os objetivos do Laborat��rio de Processamento de Sinais da Universidade Federal do Par��, que �� o desenvolvimento de um reconhecedor autom��tico de voz para Portugu��s Brasileiro. Mais especificamente, as principais contribui��es dessa disserta��o s��o: o desenvolvimento de alguns recursos necess��rios para a constru��o de um reconhecedor, tais como: bases de ��udio transcrito e API para desenvolvimento de aplica��es; e o desenvolvimento de duas aplica��es: uma para ditado em sistema desktop e outra para atendimento autom��tico em um call center. O Coruja, sistema desenvolvido no LaPS para reconhecimento de voz em Portugu��s Brasileiro. Este alem de conter todos os recursos para fornecer reconhecimento de voz em Portugu��s Brasileiro possui uma API para desenvolvimento de aplicativos. O aplicativo desenvolvido para ditado e edi��o de textos em desktop e o SpeechOO, este possibilita o ditado para a ferramenta Writer do pacote LibreOffice, alem de permitir a edi��o e formata��o de texto com comandos de voz. Outra contribui��o deste trabalho e a utiliza��o de reconhecimento autom��tico de voz em call centers, o Coruja foi integrado ao software Asterisk e a principal aplica��o desenvolvida foi uma unidade de resposta aud��vel com reconhecimento de voz para o atendimento de um call center nacional que atende mais de 3 mil liga��es di��rias.

Sistema baseado em software livre para reconhecimento de fala em nuvem em portugu��s brasileiro com alta disponibilidade

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabalho visa propor uma solu��o contendo um sistema de reconhecimento de fala autom��tico em nuvem. Dessa forma, n��o h�� necessidade de um reconhecedor sendo executado na pr��pria m��quina cliente, pois o mesmo estar�� dispon��vel atrav��s da Internet. Al��m do reconhecimento autom��tico de voz em nuvem, outra vertente deste trabalho �� alta disponibilidade. A import��ncia desse t��pico se d��a porque o ambiente servidor onde se planeja executar o reconhecimento em nuvem n��o pode ficar indispon��vel ao usu��rio. Dos v��rios aspectos que requerem robustez, tal como a pr��pria conex��o de Internet, o escopo desse trabalho foi definido como os softwares livres que permitem a empresas aumentarem a disponibilidade de seus servi��os. Dentre os resultados alcan��ados e para as condi��es simuladas, mostrou-se que o reconhecedor de voz em nuvem desenvolvido pelo grupo atingiu um desempenho pr��ximo ao do Google.

Reconhecimento de voz para aplica��es em automa��o implementado em FPGA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many movies of scientific fiction, machines were capable of speaking with humans. However mankind is still far away of getting those types of machines, like the famous character C3PO of Star Wars. During the last six decades the automatic speech recognition systems have been the target of many studies. Throughout these years many technics were developed to be used in applications of both software and hardware. There are many types of automatic speech recognition system, among which the one used in this work were the isolated word and independent of the speaker system, using Hidden Markov Models as the recognition system. The goals of this work is to project and synthesize the first two steps of the speech recognition system, the steps are: the speech signal acquisition and the pre-processing of the signal. Both steps were developed in a reprogrammable component named FPGA, using the VHDL hardware description language, owing to the high performance of this component and the flexibility of the language. In this work it is presented all the theory of digital signal processing, as Fast Fourier Transforms and digital filters and also all the theory of speech recognition using Hidden Markov Models and LPC processor. It is also presented all the results obtained for each one of the blocks synthesized e verified in hardware

Documenting sound change with smartphone apps

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crowdsourcing linguistic phenomena with smartphone applications is relatively new. Apps have been used to train acoustic models for automatic speech recognition (de Vries et al. 2014) and to archive endangered languages (Iwaidja Inyaman Team 2012). Leemann and Kolly (2013) developed a free app for iOS��Dial��kt ��pp (D��) (>78k downloads)��to document language change in Swiss German. Here, we present results of sound change based on D�� data. D�� predicts the users�� dialects: for 16 variables, users select their dialectal variant. D�� then tells users which dialect they speak. Underlying this prediction are maps from the Linguistic Atlas of German-speaking Switzerland (SDS, 1962-2003), which documents the linguistic situation around 1950. If predicted wrongly, users indicate their actual dialect. With this information, the 16 variables can be assessed for language change. Results revealed robustness of phonetic variables; lexical and morphological variables were more prone to change. Phonetic variables like to lift (variants: /lupf��, l��pf��, lipf��/) revealed SDS agreement scores of nearly 85%, i.e., little sound change. Not all phonetic variables are equally robust: ladle (variants: /x��l��, x��ll��, x��u��, x��, x��/) exhibited significant sound change. We will illustrate the results using maps that show details of the sound changes at hand.

T��cnicas de an��lisis, caracterizaci��n y detecci��n de se��ales de voz en entornos ac��sticos adversos

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detecci��n de Actividad de Voz en entornos ac��sticos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefon��a basadas en reconocimiento autom��tico de voz, aplicaciones en sistemas de transcripci��n autom��tica, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial inter��s en el estudio de las voces de fondo, principal fuente de error de la mayor��a de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de caracter��sticas contiene dos componentes: la energ��a normalizada y la variaci��n de la energ��a. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliaci��n del vector de caracter��sticas de partida dot��ndole as�� de informaci��n espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topolog��as y, finalmente, 3) estudio e inclusi��n de nuevas caracter��sticas, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detecci��n, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Voice-processing technologies--their application in telecommunications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Google Translate e Microsoft Translator - Valutazione di due applicazioni per la traduzione automatica del parlato e analisi di una tecnologia in evoluzione.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work focuses on Machine Translation (MT) and Speech-to-Speech Translation, two emerging technologies that allow users to automatically translate written and spoken texts. The first part of this work provides a theoretical framework for the evaluation of Google Translate and Microsoft Translator, which is at the core of this study. Chapter one focuses on Machine Translation, providing a definition of this technology and glimpses of its history. In this chapter we will also learn how MT works, who uses it, for what purpose, what its pros and cons are, and how machine translation quality can be defined and assessed. Chapter two deals with Speech-to-Speech Translation by focusing on its history, characteristics and operation, potential uses and limits deriving from the intrinsic difficulty of translating spoken language. After describing the future prospects for SST, the final part of this chapter focuses on the quality assessment of Speech-to-Speech Translation applications. The last part of this dissertation describes the evaluation test carried out on Google Translate and Microsoft Translator, two mobile translation apps also providing a Speech-to-Speech Translation service. Chapter three illustrates the objectives, the research questions, the participants, the methodology and the elaboration of the questionnaires used to collect data. The collected data and the results of the evaluation of the automatic speech recognition subsystem and the language translation subsystem are presented in chapter four and finally analysed and compared in chapter five, which provides a general description of the performance of the evaluated apps and possible explanations for each set of results. In the final part of this work suggestions are made for future research and reflections on the usability and usefulness of the evaluated translation apps are provided.

Studio e sviluppo di un framework per il riconoscimento vocale nell'ambito di sistemi Hands-Free

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Negli ultimi anni, l'avanzamento incredibilmente rapido della tecnologia ha portato allo sviluppo e alla diffusione di dispositivi elettronici portatili aventi dimensioni estremamente ridotte e, allo stesso tempo, capacit�� computazionali molto notevoli. Pi�� nello specifico, una particolare categoria di dispositivi, attualmente in forte sviluppo, che ha gi�� fatto la propria comparsa sul mercato mondiale �� sicuramente la categoria dei dispositivi Wearable. Come suggerisce il nome, questi sono progettati per essere letteralmente indossati, pensati per fornire continuo supporto, in diversi ambiti, a chi li utilizza. Se per interagire con essi l��utente non deve ricorrere obbligatoriamente all'utilizzo delle mani, allora si parla di dispositivi Wearable Hands Free. Questi sono generalmente in grado di percepire e catture l��input dell'utente seguendo tecniche e metodologie diverse, non basate sul tatto. Una di queste �� sicuramente quella che prevede di modellare l��input dell��utente stesso attraverso la sua voce, appoggiandosi alla disciplina dell��ASR (Automatic Speech Recognition), che si occupa della traduzione del linguaggio parlato in testo, mediante l��utilizzo di dispositivi computerizzati. Si giunge quindi all��obiettivo della tesi, che �� quello di sviluppare un framework, utilizzabile nell��ambito dei dispositivi Wearable, che fornisca un servizio di riconoscimento vocale appoggiandosi ad uno gi�� esistente, in modo che presenti un certo livello di efficienza e facilit�� di utilizzo. Pi�� in generale, in questo documento si punta a fornire una descrizione approfondita di quelli che sono i dispositivi Wearable e Wearable Hands-Free, definendone caratteristiche, criticit�� e ambiti di utilizzo. Inoltre, l��intento �� quello di illustrare i principi di funzionamento dell��Automatic Speech Recognition per passare poi ad analisi, progettazione e sviluppo del framework appena citato.

(Semi-)automatisch ondertitelen en vertalen van leermateriaal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This presentation summarizes experience with the automated speech recognition and translation approach realised in the context of the European project EMMA.

Augmented Audiovisual Translation: the Perks and Perils of the Implementation of Artificial Intelligence in Subtitling and Dubbing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis examines the state of audiovisual translation (AVT) in the aftermath of the COVID-19 emergency, highlighting new trends with regards to the implementation of AI technologies as well as their strengths, constraints, and ethical implications. It starts with an overview of the current AVT landscape, focusing on future projections about its evolution and its critical aspects such as the worsening working conditions lamented by AVT professionals �� especially freelancers �� in recent years and how they might be affected by the advent of AI technologies in the industry. The second chapter delves into the history and development of three AI technologies which are used in combination with neural machine translation in automatic AVT tools: automatic speech recognition, speech synthesis and deepfakes (voice cloning and visual deepfakes for lip syncing), including real examples of start-up companies that utilize them �� or are planning to do so �� to localize audiovisual content automatically or semi-automatically. The third chapter explores the many ethical concerns around these innovative technologies, which extend far beyond the field of translation; at the same time, it attempts to revindicate their potential to bring about immense progress in terms of accessibility and international cooperation, provided that their use is properly regulated. Lastly, the fourth chapter describes two experiments, testing the efficacy of the currently available tools for automatic subtitling and automatic dubbing respectively, in order to take a closer look at their perks and limitations compared to more traditional approaches. This analysis aims to help discerning legitimate concerns from unfounded speculations with regards to the AI technologies which are entering the field of AVT; the intention behind it is to humbly suggest a constructive and optimistic view of the technological transformations that appear to be underway, whilst also acknowledging their potential risks.

The future of dubbing: an overview of the new technologies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.

«
1
2
3
4
5
6
7
8
...
64
65
»