820 resultados para continuous speech
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
We present a novel approach using both sustained vowels and connected speech, to detect obstructive sleep apnea (OSA) cases within a homogeneous group of speakers. The proposed scheme is based on state-of-the-art GMM-based classifiers, and acknowledges specifically the way in which acoustic models are trained on standard databases, as well as the complexity of the resulting models and their adaptation to specific data. Our experimental database contains a suitable number of utterances and sustained speech from healthy (i.e control) and OSA Spanish speakers. Finally, a 25.1% relative reduction in classification error is achieved when fusing continuous and sustained speech classifiers. Index Terms: obstructive sleep apnea (OSA), gaussian mixture models (GMMs), background model (BM), classifier fusion.
Resumo:
The introduction of open-plan offices in the 1960s with the intent of making the workplace more flexible, efficient, and team-oriented resulted in a higher noise floor level, which not only made concentrated work more difficult, but also caused physiological problems, such as increased stress, in addition to a loss of speech privacy. Irrelevant background human speech, in particular, has proven to be a major factor in disrupting concentration and lowering performance. Therefore, reducing the intelligibility of speech and has been a goal of increasing importance in recent years. One method employed to do so is the use of masking noises, which consists in emitting a continuous noise signal over a loudspeaker system that conceals the perturbing speech. Studies have shown that while effective, the maskers employed to date – normally filtered pink noise – are generally poorly accepted by users. The collaborative "Private Workspace" project, within the scope of which this thesis was carried out, attempts to develop a coupled, adaptive noise masking system along with a physical structure to be used for open-plan offices so as to combat these issues. There is evidence to suggest that nature sounds might be more accepted as masker, in part because they can have a visual object that acts as the source for the sound. Direct audio recordings are not recommended for various reasons, and thus the nature sounds must be synthesized. This work done consists of the synthesis of a sound texture to be used as a masker as well as its evaluation. The sound texture is composed of two parts: a wind-like noise synthesized with subtractive synthesis, and a leaf-like noise synthesized through granular synthesis. Different combinations of these two noises produced five variations of the masker, which were evaluated at different levels along with white noise and pink noise using a modified version of an Oldenburger Satztest to test for an affect on speech intelligibility and a questionnaire to asses its subjective acceptance. The goal was to find which of the synthesized noises works best as a speech masker. This thesis first uses a theoretical introduction to establish the basics of sound perception, psychoacoustic masking, and sound texture synthesis. The design of each of the noises, as well as their respective implementations in MATLAB, is explained, followed by the procedures used to evaluate the maskers. The results obtained in the evaluation are analyzed. Lastly, conclusions are drawn and future work is and modifications to the masker are proposed. RESUMEN. La introducción de las oficinas abiertas en los años 60 tenía como objeto flexibilizar el ambiente laboral, hacerlo más eficiente y que estuviera más orientado al trabajo en equipo. Como consecuencia, subió el nivel de ruido de fondo, que no sólo dificulta la concentración, sino que causa problemas fisiológicos, como el aumento del estrés, además de reducir la privacidad. Hay estudios que prueban que las conversaciones de fondo en particular tienen un efecto negativo en el nivel de concentración y disminuyen el rendimiento de los trabajadores. Por lo tanto, reducir la inteligibilidad del habla es uno de los principales objetivos en la actualidad. Un método empleado para hacerlo ha sido el uso de ruido enmascarante, que consiste en reproducir señales continuas de ruido a través de un sistema de altavoces que enmascare el habla. Aunque diversos estudios demuestran que es un método eficaz, los ruidos utilizados hasta la fecha (normalmente ruido rosa filtrado), no son muy bien aceptados por los usuarios. El proyecto colaborativo "Private Workspace", dentro del cual se engloba el trabajo realizado en este Proyecto Fin de Grado, tiene por objeto desarrollar un sistema de ruido enmascarador acoplado y adaptativo, además de una estructura física, para su uso en oficinas abiertas con el fin de combatir los problemas descritos anteriormente. Existen indicios de que los sonidos naturales son mejor aceptados, en parte porque pueden tener una estructura física que simule ser la fuente de los mismos. La utilización de grabaciones directas de estos sonidos no está recomendada por varios motivos, y por lo tanto los sonidos naturales deben ser sintetizados. El presente trabajo consiste en la síntesis de una textura de sonido (en inglés sound texture) para ser usada como ruido enmascarador, además de su evaluación. La textura está compuesta de dos partes: un sonido de viento sintetizado mediante síntesis sustractiva y un sonido de hojas sintetizado mediante síntesis granular. Diferentes combinaciones de estos dos sonidos producen cinco variaciones de ruido enmascarador. Estos cinco ruidos han sido evaluados a diferentes niveles, junto con ruido blanco y ruido rosa, mediante una versión modificada de un Oldenburger Satztest para comprobar cómo afectan a la inteligibilidad del habla, y mediante un cuestionario para una evaluación subjetiva de su aceptación. El objetivo era encontrar qué ruido de los que se han sintetizado funciona mejor como enmascarador del habla. El proyecto consiste en una introducción teórica que establece las bases de la percepción del sonido, el enmascaramiento psicoacústico, y la síntesis de texturas de sonido. Se explica a continuación el diseño de cada uno de los ruidos, así como su implementación en MATLAB. Posteriormente se detallan los procedimientos empleados para evaluarlos. Los resultados obtenidos se analizan y se extraen conclusiones. Por último, se propone un posible trabajo futuro y mejoras al ruido sintetizado.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Awareness of optimal behaviour states of children with profound intellectual disability has been reported in the literature as a potentially useful tool for planning intervention within this population. Some arguments have been raised, however, which question the reliability and validity of previously published work on behaviour state analysis. This article sheds light on the debate by presenting two stages of a study of behaviour state analysis for eight girls with Rett syndrome. The results support Mudford, Hogg, and Roberts' (1997, 1999) concerns with the pooling of participant data. The results of Stage 2 also suggest, however, that most categories of behaviour state can be reliably distinguished once definitions of behaviours for each state are clearly defined.
Resumo:
Objective: To evaluate the effectiveness of continuous positive airway pressure (CPAP) therapy in the treatment of hypernasality following traumatic brain injury (17111). Design: An A-B-A experimental research design. Assessments were conducted prior to commencement of the program, midway, immediately posttreatment, and 1 month after completion of the CPAP therapy program. Participants: Three adults with dysarthria and moderate to severe hypernasality subsequent to TBI. Outcome Measures: Perceptual evaluation using the Frenchay Dysarthria Assessment, the Assessment of Intelligibility of Dysarthric Speech, and a speech sample analysis, and instrumental evaluation using the Nasometer. Results: Between assessment periods, varying degrees of improvement in hypernasality and sentence intelligibility were noted. At the 1-month post-CPAP assessment, all 3 participants demonstrated reduced nasalance values, and 2 exhibited increased sentence intelligibility. Conclusions: CPAP may be a valuable treatment of impaired velopharyngeal function in the TBI population.
Resumo:
Common approaches to IP-traffic modelling have featured the use of stochastic models, based on the Markov property, which can be classified into black box and white box models based on the approach used for modelling traffic. White box models, are simple to understand, transparent and have a physical meaning attributed to each of the associated parameters. To exploit this key advantage, this thesis explores the use of simple classic continuous-time Markov models based on a white box approach, to model, not only the network traffic statistics but also the source behaviour with respect to the network and application. The thesis is divided into two parts: The first part focuses on the use of simple Markov and Semi-Markov traffic models, starting from the simplest two-state model moving upwards to n-state models with Poisson and non-Poisson statistics. The thesis then introduces the convenient to use, mathematically derived, Gaussian Markov models which are used to model the measured network IP traffic statistics. As one of the most significant contributions, the thesis establishes the significance of the second-order density statistics as it reveals that, in contrast to first-order density, they carry much more unique information on traffic sources and behaviour. The thesis then exploits the use of Gaussian Markov models to model these unique features and finally shows how the use of simple classic Markov models coupled with use of second-order density statistics provides an excellent tool for capturing maximum traffic detail, which in itself is the essence of good traffic modelling. The second part of the thesis, studies the ON-OFF characteristics of VoIP traffic with reference to accurate measurements of the ON and OFF periods, made from a large multi-lingual database of over 100 hours worth of VoIP call recordings. The impact of the language, prosodic structure and speech rate of the speaker on the statistics of the ON-OFF periods is analysed and relevant conclusions are presented. Finally, an ON-OFF VoIP source model with log-normal transitions is contributed as an ideal candidate to model VoIP traffic and the results of this model are compared with those of previously published work.
Resumo:
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.
Resumo:
In this study, we aimed to evaluate the effects of exenatide (EXE) treatment on exocrine pancreas of nonhuman primates. To this end, 52 baboons (Papio hamadryas) underwent partial pancreatectomy, followed by continuous infusion of EXE or saline (SAL) for 14 weeks. Histological analysis, immunohistochemistry, Computer Assisted Stereology Toolbox morphometry, and immunofluorescence staining were performed at baseline and after treatment. The EXE treatment did not induce pancreatitis, parenchymal or periductal inflammatory cell accumulation, ductal hyperplasia, or dysplastic lesions/pancreatic intraepithelial neoplasia. At study end, Ki-67-positive (proliferating) acinar cell number did not change, compared with baseline, in either group. Ki-67-positive ductal cells increased after EXE treatment (P = 0.04). However, the change in Ki-67-positive ductal cell number did not differ significantly between the EXE and SAL groups (P = 0.13). M-30-positive (apoptotic) acinar and ductal cell number did not change after SAL or EXE treatment. No changes in ductal density and volume were observed after EXE or SAL. Interestingly, by triple-immunofluorescence staining, we detected c-kit (a marker of cell transdifferentiation) positive ductal cells co-expressing insulin in ducts only in the EXE group at study end, suggesting that EXE may promote the differentiation of ductal cells toward a β-cell phenotype. In conclusion, 14 weeks of EXE treatment did not exert any negative effect on exocrine pancreas, by inducing either pancreatic inflammation or hyperplasia/dysplasia in nonhuman primates.
Resumo:
This paper deals with the long run average continuous control problem of piecewise deterministic Markov processes (PDMPs) taking values in a general Borel space and with compact action space depending on the state variable. The control variable acts on the jump rate and transition measure of the PDMP, and the running and boundary costs are assumed to be positive but not necessarily bounded. Our first main result is to obtain an optimality equation for the long run average cost in terms of a discrete-time optimality equation related to the embedded Markov chain given by the postjump location of the PDMP. Our second main result guarantees the existence of a feedback measurable selector for the discrete-time optimality equation by establishing a connection between this equation and an integro-differential equation. Our final main result is to obtain some sufficient conditions for the existence of a solution for a discrete-time optimality inequality and an ordinary optimal feedback control for the long run average cost using the so-called vanishing discount approach. Two examples are presented illustrating the possible applications of the results developed in the paper.
Resumo:
We show that scalable multipartite entanglement among light fields may be generated by optical parametric oscillators (OPOs). The tripartite entanglement existent among the three bright beams produced by a single OPO-pump, signal, and idler-is scalable to a system of many OPOs by pumping them in cascade with the same optical field. This latter serves as an entanglement distributor. The special case of two OPOs is studied, as it is shown that the resulting five bright beams share genuine multipartite entanglement. In addition, the structure of entanglement distribution among the fields can be manipulated to some degree by tuning the incident pump power. The scalability to many fields is straightforward, allowing an alternative implementation of a multipartite quantum information network with continuous variables.
Resumo:
We study the structural phase transitions in confined systems of strongly interacting particles. We consider infinite quasi-one-dimensional systems with different pairwise repulsive interactions in the presence of an external confinement following a power law. Within the framework of Landau's theory, we find the necessary conditions to observe continuous transitions and demonstrate that the only allowed continuous transition is between the single-and the double-chain configurations and that it only takes place when the confinement is parabolic. We determine analytically the behavior of the system at the transition point and calculate the critical exponents. Furthermore, we perform Monte Carlo simulations and find a perfect agreement between theory and numerics.
Resumo:
The uptake of ascorbate by neuroblastoma cells using a ruthenium oxide hexacyanoferrate (RuOHCF)-modified carbon fiber disc (CFD) microelectrode (r = 14.5 mu m) was investigated. By use of the proposed electrochemical sensor the amperometric determination of ascorbate was performed at 0.0 V in minimum essential medium (MEM, pH = 7.2) with a limit of detection of 25 mu mol L(-1). Under the optimum experimental conditions, no interference from MEM constituents and reduced glutathione (used to prevent the oxidation of ascorbate during the experiments) was noticed. The stability of the RuOHCF-modified electrode response was studied by measuring the sensitivity over an extended period of time (120 h), a decrease of around 10% being noticed at the end of the experiment. The rate of ascorbate uptake by control human neuroblastoma SH-SY5Y cells, and cells transfected with wild-type Cu,Zn-superoxide dismutase (SOD WT) or with a mutant typical of familial amyotrophic lateral sclerosis (SOD G93A), was in agreement with the level of oxidative stress in these cells. The usefulness of the RuOHCF-modified microelectrode for in vivo monitoring of ascorbate inside neuroblastoma cells was also demonstrated.
Resumo:
Here, I investigate the use of Bayesian updating rules applied to modeling how social agents change their minds in the case of continuous opinion models. Given another agent statement about the continuous value of a variable, we will see that interesting dynamics emerge when an agent assigns a likelihood to that value that is a mixture of a Gaussian and a uniform distribution. This represents the idea that the other agent might have no idea about what is being talked about. The effect of updating only the first moments of the distribution will be studied, and we will see that this generates results similar to those of the bounded confidence models. On also updating the second moment, several different opinions always survive in the long run, as agents become more stubborn with time. However, depending on the probability of error and initial uncertainty, those opinions might be clustered around a central value.