25 resultados para Digit speech recognition

em Instituto Politécnico do Porto, Portugal


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last twenty years genetic algorithms (GAs) were applied in a plethora of fields such as: control, system identification, robotics, planning and scheduling, image processing, and pattern and speech recognition (Bäck et al., 1997). In robotics the problems of trajectory planning, collision avoidance and manipulator structure design considering a single criteria has been solved using several techniques (Alander, 2003). Most engineering applications require the optimization of several criteria simultaneously. Often the problems are complex, include discrete and continuous variables and there is no prior knowledge about the search space. These kind of problems are very more complex, since they consider multiple design criteria simultaneously within the optimization procedure. This is known as a multi-criteria (or multiobjective) optimization, that has been addressed successfully through GAs (Deb, 2001). The overall aim of multi-criteria evolutionary algorithms is to achieve a set of non-dominated optimal solutions known as Pareto front. At the end of the optimization procedure, instead of a single optimal (or near optimal) solution, the decision maker can select a solution from the Pareto front. Some of the key issues in multi-criteria GAs are: i) the number of objectives, ii) to obtain a Pareto front as wide as possible and iii) to achieve a Pareto front uniformly spread. Indeed, multi-objective techniques using GAs have been increasing in relevance as a research area. In 1989, Goldberg suggested the use of a GA to solve multi-objective problems and since then other researchers have been developing new methods, such as the multi-objective genetic algorithm (MOGA) (Fonseca & Fleming, 1995), the non-dominated sorted genetic algorithm (NSGA) (Deb, 2001), and the niched Pareto genetic algorithm (NPGA) (Horn et al., 1994), among several other variants (Coello, 1998). In this work the trajectory planning problem considers: i) robots with 2 and 3 degrees of freedom (dof ), ii) the inclusion of obstacles in the workspace and iii) up to five criteria that are used to qualify the evolving trajectory, namely the: joint traveling distance, joint velocity, end effector / Cartesian distance, end effector / Cartesian velocity and energy involved. These criteria are used to minimize the joint and end effector traveled distance, trajectory ripple and energy required by the manipulator to reach at destination point. Bearing this ideas in mind, the paper addresses the planning of robot trajectories, meaning the development of an algorithm to find a continuous motion that takes the manipulator from a given starting configuration up to a desired end position without colliding with any obstacle in the workspace. The chapter is organized as follows. Section 2 describes the trajectory planning and several approaches proposed in the literature. Section 3 formulates the problem, namely the representation adopted to solve the trajectory planning and the objectives considered in the optimization. Section 4 studies the algorithm convergence. Section 5 studies a 2R manipulator (i.e., a robot with two rotational joints/links) when the optimization trajectory considers two and five objectives. Sections 6 and 7 show the results for the 3R redundant manipulator with five goals and for other complementary experiments are described, respectively. Finally, section 8 draws the main conclusions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática, Área de Especialização em Tecnologias do Conhecimento e da Decisão

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tongue is the most important and dynamic articulator for speech formation, because of its anatomic aspects (particularly, the large volume of this muscular organ comparatively to the surrounding organs of the vocal tract) and also due to the wide range of movements and flexibility that are involved. In speech communication research, a variety of techniques have been used for measuring the three-dimensional vocal tract shapes. More recently, magnetic resonance imaging (MRI) becomes common; mainly, because this technique allows the collection of a set of static and dynamic images that can represent the entire vocal tract along any orientation. Over the years, different anatomical organs of the vocal tract have been modelled; namely, 2D and 3D tongue models, using parametric or statistical modelling procedures. Our aims are to present and describe some 3D reconstructed models from MRI data, for one subject uttering sustained articulations of some typical Portuguese sounds. Thus, we present a 3D database of the tongue obtained by stack combinations with the subject articulating Portuguese vowels. This 3D knowledge of the speech organs could be very important; especially, for clinical purposes (for example, for the assessment of articulatory impairments followed by tongue surgery in speech rehabilitation), and also for a better understanding of acoustic theory in speech formation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The first and second authors would like to thank the support of the PhD grants with references SFRH/BD/28817/2006 and SFRH/PROTEC/49517/2009, respectively, from Fundação para a Ciência e Tecnol ogia (FCT). This work was partially done in the scope of the project “Methodologies to Analyze Organs from Complex Medical Images – Applications to Fema le Pelvic Cavity”, wi th reference PTDC/EEA- CRO/103320/2008, financially supported by FCT.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mechanisms of speech production are complex and have been raising attention from researchers of both medical and computer vision fields. In the speech production mechanism, the articulator’s study is a complex issue, since they have a high level of freedom along this process, namely the tongue, which instigates a problem in its control and observation. In this work it is automatically characterized the tongues shape during the articulation of the oral vowels of Portuguese European by using statistical modeling on MR-images. A point distribution model is built from a set of images collected during artificially sustained articulations of Portuguese European sounds, which can extract the main characteristics of the motion of the tongue. The model built in this work allows under standing more clearly the dynamic speech events involved during sustained articulations. The tongue shape model built can also be useful for speech rehabilitation purposes, specifically to recognize the compensatory movements of the articulators during speech production.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The indiscriminate use of antibiotics in foodproducing animals has received increasing attention as a contributory factor in the international emergence of antibiotic- resistant bacteria (Woodward in Pesticide, veterinary and other residues in food, CRC Press, Boca Raton, 2004). Numerous analytical methods for quantifying antibacterial residues in edible animal products have been developed over years (Woodward in Pesticide, veterinary and other residues in food, CRC Press, Boca Raton, 2004; Botsoglou and Fletouris in Handbook of food analysis, residues and other food component analysis, Marcel Dekker, Ghent, 2004). Being Amoxicillin (AMOX) one of those critical veterinary drugs, efforts have been made to develop simple and expeditious methods for its control in food samples. In literature, only one AMOX-selective electrode has been reported so far. In that work, phosphotungstate:amoxycillinium ion exchanger was used as electroactive material (Shoukry et al. in Electroanalysis 6:914–917, 1994). Designing new materials based on molecularly imprinted polymers (MIPs) which are complementary to the size and charge of AMOX could lead to very selective interactions, thus enhancing the selectivity of the sensing unit. AMOXselective electrodes used imprinted polymers as electroactive materials having AMOX as target molecule to design a biomimetic imprinted cavity. Poly(vinyl chloride), sensors of methacrylic acid displayed Nernstian slopes (60.7 mV/decade) and low detection limits (2.9×10-5 mol/L). The potentiometric responses were not affected by pH within 4–5 and showed good selectivity. The electrodes were applied successfully to the analysis of real samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As a result of the stressful conditions in aquaculture facilities there is a high risk of bacterial infections among cultured fish. Chlortetracycline (CTC) is one of the antimicrobials used to solve this problem. It is a broad spectrum antibacterial active against a wide range of Gram-positive and Gram-negative bacteria. Numerous analytical methods for screening, identifying, and quantifying CTC in animal products have been developed over the years. An alternative and advantageous method should rely on expeditious and efficient procedures providing highly specific and sensitive measurements in food samples. Ion-selective electrodes (ISEs) could meet these criteria. The only ISE reported in literature for this purpose used traditional electro-active materials. A selectivity enhancement could however be achieved after improving the analyte recognition by molecularly imprinted polymers (MIPs). Several MIP particles were synthesized and used as electro-active materials. ISEs based in methacrylic acid monomers showed the best analytical performance according to slope (62.5 and 68.6 mV/decade) and detection limit (4.1×10−5 and 5.5×10−5 mol L−1). The electrodes displayed good selectivity. The ISEs are not affected by pH changes ranging from 2.5 to 13. The sensors were successfully applied to the analysis of serum, urine and fish samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning and teaching processes, like all human activities, can be mediated through the use of tools. Information and communication technologies are now widespread within education. Their use in the daily life of teachers and learners affords engagement with educational activities at any place and time and not necessarily linked to an institution or a certificate. In the absence of formal certification, learning under these circumstances is known as informal learning. Despite the lack of certification, learning with technology in this way presents opportunities to gather information about and present new ways of exploiting an individual’s learning. Cloud technologies provide ways to achieve this through new architectures, methodologies, and workflows that facilitate semantic tagging, recognition, and acknowledgment of informal learning activities. The transparency and accessibility of cloud services mean that institutions and learners can exploit existing knowledge to their mutual benefit. The TRAILER project facilitates this aim by providing a technological framework using cloud services, a workflow, and a methodology. The services facilitate the exchange of information and knowledge associated with informal learning activities ranging from the use of social software through widgets, computer gaming, and remote laboratory experiments. Data from these activities are shared among institutions, learners, and workers. The project demonstrates the possibility of gathering information related to informal learning activities independently of the context or tools used to carry them out.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The application of information technologies (specially the Internet, Web 2.0 and social tools) make informal learning more visible. This kind of learning is not linked to an institution or a period of time, but it is important enough to be taken into account. On the one hand, learners should be able to communicate to the institutions they are related to, what skills they possess, whether they were achieved in a formal or informal way. On the other hand the companies and educational institutions need to have a deeper knowledge about the competencies of their staff. The TRAILER project provides a methodology supported by a technological framework to facilitate communication about informal learning between businesses, employees and learners. The paper presents the project and some of the work carried out, an exploratory analysis about how informal learning is considered and the technological framework proposed. Whilst challenges remain in terms of establishing the meaningfulness of technological engagement for employees and businesses, the continuing transformation of the social, technological and educational environment is likely to lead to greater emphasis for the effective exploitation of informal learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there is a focus on collecting information on and differentiating between the effectiveness of PT and AT for children with different types of phonologically based SSD, as well as on the role of phonological awareness in remediating SSD. It is important to collect more evidence for the most effective and efficient type of intervention approach for different SSDs and for these data to be collected from diverse linguistic and cultural perspectives. Aims: To evaluate the effectiveness of a PT and AT approach for treatment of 14 Portuguese children, aged 4.0–6.7 years, with a phonologically based SSD. Methods & Procedures: The children were randomly assigned to one of the two treatment approaches (seven children in each group). All children were treated by the same SLT, blind to the aims of the study, over three blocks of a total of 25 weekly sessions of intervention. Outcome measures of phonological ability (percentage of consonants correct (PCC), percentage occurrence of different phonological processes and phonetic inventory) were taken before and after intervention. A qualitative assessment of intervention effectiveness from the perspective of the parents of participants was included. Outcomes & Results: Both treatments were effective in improving the participants’ speech, with the children receiving PT showing a more significant improvement in PCC score than those receiving the AT. Children in the PT group also showed greater generalization to untreated words than those receiving AT. Parents reported both intervention approaches to be as effective in improving their children’s speech. Conclusions & Implications: The PT (combination of expressive phonological tasks, phonological awareness, listening and discrimination activities) proved to be an effective integrated method of improving phonological SSD in children. These findings provide some evidence for Portuguese SLTs to employ PT with children with phonologically based SSD

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The relation of automatic auditory discrimination, measured with MMN, with the type of stimuli has not been well established in the literature, despite its importance as an electrophysiological measure of central sound representation. In this study, MMN response was elicited by pure-tone and speech binaurally passive auditory oddball paradigm in a group of 8 normal young adult subjects at the same intensity level (75 dB SPL). The frequency difference in pure-tone oddball was 100 Hz (standard = 1 000 Hz; deviant = 1 100 Hz; same duration = 100 ms), in speech oddball (standard /ba/; deviant /pa/; same duration = 175 ms) the Portuguese phonemes are both plosive bi-labial in order to maintain a narrow frequency band. Differences were found across electrode location between speech and pure-tone stimuli. Larger MMN amplitude, duration and higher latency to speech were verified compared to pure-tone in Cz and Fz as well as significance differences in latency and amplitude between mastoids. Results suggest that speech may be processed differently than non-speech; also it may occur in a later stage due to overlapping processes since more neural resources are required to speech processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The evolution of new technology and its increasing use, have for some years been making the existence of informal learning more and more transparent, especially among young and older adults in both Higher Education and workplace contexts. However, the nature of formal and non-formal, course-based, approaches to learning has made it hard to accommodate these informal processes satisfactorily, and although technology bring us near to the solution, it has not yet achieved. TRAILER project aims to address this problem by developing a tool for the management of competences and skills acquired through informal learning experiences, both from the perspective of the user and the institution or company. This paper describes the research and development main lines of this project.