951 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: a) multiply handicapped children have a high incidence of disorders affecting the visual system; b) assessment and management of visual disorders in this group of children presents a complex challenge; c) this study describes the results of visual function assessment in two children with neurological disability over a one-year period.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a study of the mathematical properties of voice as an audio signal -- This work includes signals in which the channel conditions are not ideal for emotion recognition -- Multiresolution analysis- discrete wavelet transform – was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states -- ANNs proved to be a system that allows an appropriate classification of such states -- This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features -- Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis examines the state of audiovisual translation (AVT) in the aftermath of the COVID-19 emergency, highlighting new trends with regards to the implementation of AI technologies as well as their strengths, constraints, and ethical implications. It starts with an overview of the current AVT landscape, focusing on future projections about its evolution and its critical aspects such as the worsening working conditions lamented by AVT professionals – especially freelancers – in recent years and how they might be affected by the advent of AI technologies in the industry. The second chapter delves into the history and development of three AI technologies which are used in combination with neural machine translation in automatic AVT tools: automatic speech recognition, speech synthesis and deepfakes (voice cloning and visual deepfakes for lip syncing), including real examples of start-up companies that utilize them – or are planning to do so – to localize audiovisual content automatically or semi-automatically. The third chapter explores the many ethical concerns around these innovative technologies, which extend far beyond the field of translation; at the same time, it attempts to revindicate their potential to bring about immense progress in terms of accessibility and international cooperation, provided that their use is properly regulated. Lastly, the fourth chapter describes two experiments, testing the efficacy of the currently available tools for automatic subtitling and automatic dubbing respectively, in order to take a closer look at their perks and limitations compared to more traditional approaches. This analysis aims to help discerning legitimate concerns from unfounded speculations with regards to the AI technologies which are entering the field of AVT; the intention behind it is to humbly suggest a constructive and optimistic view of the technological transformations that appear to be underway, whilst also acknowledging their potential risks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Experiment 1, color-naming interference for target stimuli following associated primes was greater in a group making a lexical decision to the prime than in a group reading the prime silently. High-frequency targets were responded to more quickly than low-frequency targets. In Experiment 2, with subjects naming the prime, there was evidence of associative interference when the prime and the target were grouped temporally but not when the intertrial interval was comparable with the prime-target interval. Associative primes presented at a short (120-msec) prime-target stimulus onset asynchrony facilitated color naming in Experiment 3. Taken together, the results suggest that the effect of faster processing of the base word in a color-naming task is facilitatory and that color-naming priming interference arises when associative prime processing increases conflict between word and color responses by enhancing phonological or articulatory activation of the base word.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Audiometry is the main way with which hearing is evaluated, because it is a universal and standardized test. Speech tests are difficult to standardize due to the variables involved, their performance in the presence of competitive noise is of great importance. Aim: To characterize speech intelligibility in silence and in competitive noise from individuals exposed to electronically amplified music. Material and Method: It was performed with 20 university students who presented normal hearing thresholds. The speech recognition rate (SRR) was performed after fourteen hours of sound rest after the exposure to electronically amplified music and once again after sound rest, being studied in three stages: without competitive noise, in the presence of Babble-type competitive noise, in monotic listening, in signal/ noise ratio of + 5 dB and with the signal/ noise ratio of 5 dB. Results: There was greater damage in the SRR after exposure to the music and with competitive noise, and as the signal/ noise ratio decreases, the performance of individuals in the test also decreased. Conclusion: The inclusion of competitive noise in the speech tests in the audiological routine is important, because it represents the real disadvantage experienced by individuals in daily listening.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

University students spelled low-frequency words to dictation and subsequently made lexical decisions to them. In Experiment I, lexical decisions were slower on words students had spelled incorrectly relative to words they had spelled correctly, and there A as a larger repetition benefit 101 incorrectly spelled words. In experiment 2, the latency advantage for items spelled correctly was replicated when words were presented for only 200 ms and also in a spelling recognition task, In Experiment 3. masked identity and form priming effects were similar for words that had been spelled correctly and incorrectly, Item spelling accuracy tracked word frequency effects in the way chat it combined with repetition and priming effects. we inter that an individuals learning with a word's orthography underlies word frequency and item spelling accuracy effects and that a single orthographic lexicon serves visual word recognition and spelling. (C) 2000 Elsevier Science (USA).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biogenic amines and their receptors regulate and modulate many physiological and behavioural processes in animals. In vertebrates, octopamine is only found in trace amounts and its function as a true neurotransmitter is unclear. In protostomes, however, octopamine can act as neurotransmitter, neuromodulator and neurohormone. In the honeybee, octopamine acts as a neuromodulator and is involved in learning and memory formation. The identification of potential octopamine receptors is decisive for an understanding of the cellular pathways involved in mediating the effects of octopamine. Here we report the cloning and functional characterization of the first octopamine receptor from the honeybee, Apis mellifera . The gene was isolated from a brain-specific cDNA library. It encodes a protein most closely related to octopamine receptors from Drosophila melanogaster and Lymnea stagnalis . Signalling properties of the cloned receptor were studied in transiently transfected human embryonic kidney (HEK) 293 cells. Nanomolar to micromolar concentrations of octopamine induced oscillatory increases in the intracellular Ca2+ concentration. In contrast to octopamine, tyramine only elicited Ca2+ responses at micromolar concentrations. The gene is abundantly expressed in many somata of the honeybee brain, suggesting that this octopamine receptor is involved in the processing of sensory inputs, antennal motor outputs and higher-order brain functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last twenty years genetic algorithms (GAs) were applied in a plethora of fields such as: control, system identification, robotics, planning and scheduling, image processing, and pattern and speech recognition (Bäck et al., 1997). In robotics the problems of trajectory planning, collision avoidance and manipulator structure design considering a single criteria has been solved using several techniques (Alander, 2003). Most engineering applications require the optimization of several criteria simultaneously. Often the problems are complex, include discrete and continuous variables and there is no prior knowledge about the search space. These kind of problems are very more complex, since they consider multiple design criteria simultaneously within the optimization procedure. This is known as a multi-criteria (or multiobjective) optimization, that has been addressed successfully through GAs (Deb, 2001). The overall aim of multi-criteria evolutionary algorithms is to achieve a set of non-dominated optimal solutions known as Pareto front. At the end of the optimization procedure, instead of a single optimal (or near optimal) solution, the decision maker can select a solution from the Pareto front. Some of the key issues in multi-criteria GAs are: i) the number of objectives, ii) to obtain a Pareto front as wide as possible and iii) to achieve a Pareto front uniformly spread. Indeed, multi-objective techniques using GAs have been increasing in relevance as a research area. In 1989, Goldberg suggested the use of a GA to solve multi-objective problems and since then other researchers have been developing new methods, such as the multi-objective genetic algorithm (MOGA) (Fonseca & Fleming, 1995), the non-dominated sorted genetic algorithm (NSGA) (Deb, 2001), and the niched Pareto genetic algorithm (NPGA) (Horn et al., 1994), among several other variants (Coello, 1998). In this work the trajectory planning problem considers: i) robots with 2 and 3 degrees of freedom (dof ), ii) the inclusion of obstacles in the workspace and iii) up to five criteria that are used to qualify the evolving trajectory, namely the: joint traveling distance, joint velocity, end effector / Cartesian distance, end effector / Cartesian velocity and energy involved. These criteria are used to minimize the joint and end effector traveled distance, trajectory ripple and energy required by the manipulator to reach at destination point. Bearing this ideas in mind, the paper addresses the planning of robot trajectories, meaning the development of an algorithm to find a continuous motion that takes the manipulator from a given starting configuration up to a desired end position without colliding with any obstacle in the workspace. The chapter is organized as follows. Section 2 describes the trajectory planning and several approaches proposed in the literature. Section 3 formulates the problem, namely the representation adopted to solve the trajectory planning and the objectives considered in the optimization. Section 4 studies the algorithm convergence. Section 5 studies a 2R manipulator (i.e., a robot with two rotational joints/links) when the optimization trajectory considers two and five objectives. Sections 6 and 7 show the results for the 3R redundant manipulator with five goals and for other complementary experiments are described, respectively. Finally, section 8 draws the main conclusions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The registration of full 3-D models is an important task in computer vision. Range finders only reconstruct a partial view of the object. Many authors have proposed several techniques to register 3D surfaces from multiple views in which there are basically two aspects to consider. First, poor registration in which some sort of correspondences are established. Second, accurate registration in order to obtain a better solution. A survey of the most common techniques is presented and includes experimental results of some of them