922 resultados para Voice Digital Processing
Resumo:
In this paper we shall discuss the use of the TSIM simulation software for modelling large-scale industrial processes. The discussion draws on our recent experience of modelling a large plant in the food-processing industry. We shall focus on those features of software organization and software engineering which proved to be particularly necessary for the execution of this project, and illustrate the extent to which the use of TISM facilitated the implementation of these features. We shall also make some general remarks about the 'life-cycle' of models resulting from projects of this kind.
Resumo:
A model of the auditory periphery assembled from analog network submodels of all the relevant anatomical structures is described. There is bidirectional coupling between networks representing the outer ear, middle ear and cochlea. A simple voltage source representation of the outer hair cells provides level-dependent basilar membrane curves. The networks are translated into efficient computational modules by means of wave digital filtering. A feedback unit regulates the average firing rate at the output of an inner hair cell module via a simplified modelling of the dynamics of the descending paths to the peripheral ear. This leads to a digital model of the entire auditory periphery with applications to both speech and hearing research.
Resumo:
A modular image capture system with close integration to CCD cameras has been developed. The aim is to produce a system capable of integrating CCD sensor, image capture and image processing into a single compact unit. This close integration provides a direct mapping between CCD pixels and digital image pixels. The system has been interfaced to a digital signal processor board for the development and control of image processing tasks. These have included characterization and enhancement of noisy images from an intensified camera and measurement to subpixel resolutions. A highly compact form of the image capture system is in an advanced stage of development. This consists of a single FPGA device and a single VRAM providing a two chip image capturing system capable of being integrated into a CCD camera. A miniature compact PC has been developed using a novel modular interconnection technique, providing a processing unit in a three dimensional format highly suited to integration into a CCD camera unit. Work is under way to interface the compact capture system to the PC using this interconnection technique, combining CCD sensor, image capture and image processing into a single compact unit. ©2005 Copyright SPIE - The International Society for Optical Engineering.
Resumo:
The University of Cambridge is unusual in that its Department of Engineering is a single department which covers virtually all branches of engineering under one roof. In their first two years of study, our undergrads study the full breadth of engineering topics and then have to choose a specialization area for the final two years of study. Here we describe part of a course, given towards the end of their second year, which is designed to entice these students to specialize in signal processing and information engineering topics for years 3 and 4. The course is based around a photo editor and an image search application, and it requires no prior knowledge of the z-transform or of 2-dimensional signal processing. It does assume some knowledge of 1-D convolution and basic Fourier methods and some prior exposure to Matlab. The subject of this paper, the photo editor, is written in standard Matlab m-files which are fully visible to the students and help them to see how specific algorithms are implemented in detail. © 2011 IEEE.
Resumo:
In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition. © 2012 Elsevier B.V. All rights reserved.
A Videogrammetric As-Built Data Collection Method for Digital Fabrication of Sheet Metal Roof Panels
Resumo:
A roofing contractor typically needs to acquire as-built dimensions of a roof structure several times over the course of its build to be able to digitally fabricate sheet metal roof panels. Obtaining these measurements using the exiting roof surveying methods could be costly in terms of equipment, labor, and/or worker exposure to safety hazards. This paper presents a video-based surveying technology as an alternative method which is simple to use, automated, less expensive, and safe. When using this method, the contractor collects video streams with a calibrated stereo camera set. Unique visual characteristics of scenes from a roof structure are then used in the processing step to automatically extract as-built dimensions of roof planes. These dimensions are finally represented in a XML format to be loaded into sheet metal folding and cutting machines. The proposed method has been tested for a roofing project and the preliminary results indicate its capabilities.
Resumo:
IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 5, pp. 1338-1343, 2003.
Resumo:
The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper.
Resumo:
Great demand in power optimized devices shows promising economic potential and draws lots of attention in industry and research area. Due to the continuously shrinking CMOS process, not only dynamic power but also static power has emerged as a big concern in power reduction. Other than power optimization, average-case power estimation is quite significant for power budget allocation but also challenging in terms of time and effort. In this thesis, we will introduce a methodology to support modular quantitative analysis in order to estimate average power of circuits, on the basis of two concepts named Random Bag Preserving and Linear Compositionality. It can shorten simulation time and sustain high accuracy, resulting in increasing the feasibility of power estimation of big systems. For power saving, firstly, we take advantages of the low power characteristic of adiabatic logic and asynchronous logic to achieve ultra-low dynamic and static power. We will propose two memory cells, which could run in adiabatic and non-adiabatic mode. About 90% dynamic power can be saved in adiabatic mode when compared to other up-to-date designs. About 90% leakage power is saved. Secondly, a novel logic, named Asynchronous Charge Sharing Logic (ACSL), will be introduced. The realization of completion detection is simplified considerably. Not just the power reduction improvement, ACSL brings another promising feature in average power estimation called data-independency where this characteristic would make power estimation effortless and be meaningful for modular quantitative average case analysis. Finally, a new asynchronous Arithmetic Logic Unit (ALU) with a ripple carry adder implemented using the logically reversible/bidirectional characteristic exhibiting ultra-low power dissipation with sub-threshold region operating point will be presented. The proposed adder is able to operate multi-functionally.
Resumo:
Existing work in Computer Science and Electronic Engineering demonstrates that Digital Signal Processing techniques can effectively identify the presence of stress in the speech signal. These techniques use datasets containing real or actual stress samples i.e. real-life stress such as 911 calls and so on. Studies that use simulated or laboratory-induced stress have been less successful and inconsistent. Pervasive, ubiquitous computing is increasingly moving towards voice-activated and voice-controlled systems and devices. Speech recognition and speaker identification algorithms will have to improve and take emotional speech into account. Modelling the influence of stress on speech and voice is of interest to researchers from many different disciplines including security, telecommunications, psychology, speech science, forensics and Human Computer Interaction (HCI). The aim of this work is to assess the impact of moderate stress on the speech signal. In order to do this, a dataset of laboratory-induced stress is required. While attempting to build this dataset it became apparent that reliably inducing measurable stress in a controlled environment, when speech is a requirement, is a challenging task. This work focuses on the use of a variety of stressors to elicit a stress response during tasks that involve speech content. Biosignal analysis (commercial Brain Computer Interfaces, eye tracking and skin resistance) is used to verify and quantify the stress response, if any. This thesis explains the basis of the author’s hypotheses on the elicitation of affectively-toned speech and presents the results of several studies carried out throughout the PhD research period. These results show that the elicitation of stress, particularly the induction of affectively-toned speech, is not a simple matter and that many modulating factors influence the stress response process. A model is proposed to reflect the author’s hypothesis on the emotional response pathways relating to the elicitation of stress with a required speech content. Finally the author provides guidelines and recommendations for future research on speech under stress. Further research paths are identified and a roadmap for future research in this area is defined.
Resumo:
A digital differentiator simply involves the derivation of an input signal. This work includes the presentation of first-degree and second-degree differentiators, which are designed as both infinite-impulse-response (IIR) filters and finite-impulse-response (FIR) filters. The proposed differentiators have low-pass magnitude response characteristics, thereby rejecting noise frequencies higher than the cut-off frequency. Both steady-state frequency-domain characteristics and Time-domain analyses are given for the proposed differentiators. It is shown that the proposed differentiators perform well when compared to previously proposed filters. When considering the time-domain characteristics of the differentiators, the processing of quantized signals proved especially enlightening, in terms of the filtering effects of the proposed differentiators. The coefficients of the proposed differentiators are obtained using an optimization algorithm, while the optimization objectives include magnitude and phase response. The low-pass characteristic of the proposed differentiators is achieved by minimizing the filter variance. The low-pass differentiators designed show the steep roll-off, as well as having highly accurate magnitude response in the pass-band. While having a history of over three hundred years, the design of fractional differentiator has become a ‘hot topic’ in recent decades. One challenging problem in this area is that there are many different definitions to describe the fractional model, such as the Riemann-Liouville and Caputo definitions. Through use of a feedback structure, based on the Riemann-Liouville definition. It is shown that the performance of the fractional differentiator can be improved in both the frequency-domain and time-domain. Two applications based on the proposed differentiators are described in the thesis. Specifically, the first of these involves the application of second degree differentiators in the estimation of the frequency components of a power system. The second example concerns for an image processing, edge detection application.
Resumo:
Gabriel Urbain Fauré lived during one of the most exciting times in music history. Spanning a life of 79 years (1845-1924), he lived through the height of Romanticism and the experimental avant-garde techniques of the early 20th century. In Fauré's music, one can find traces of Chopin, Liszt, Mendelssohn, Debussy and Poulenc. One can even argue that Fauré presages Skryabin and Shostakovich. The late works of Gabriel Fauré, chiefly those composed after 1892, testify to the argument that Fauré holds an important position in the shift from tonal to atonal composition and should be counted among such transitional composers as Gustav Mahler, Claude Debussy, Erik Satie, Richard Strauss, and Ferruccio Busoni. Fauré's unique way of fashioning harmonic impetus of almost purely linear means, resulting in a synthesis of harmonic and melodic devices, led me to craft the term mélodoharmonique. This term refers to a contrapuntally motivated technique of composition, particularly in a secondary layer of musical texture, in which a component of harmonic progression (i.e. arpeggiation, broken chord, etc.) is fused with linear motivic or thematic development. This dissertation seeks to bring to public attention through exploration in lecture and recital format, certain works of Gabriel Fauré, written after 1892. The repertoire will be selected from works for solo piano and piano in collaboration with violin, violoncello, and voice, which support the notion of Fauré as a modernist deserving larger recognition for his influence in the transition to atonal music. The recital repertoire includes the following--Song Cycles: La bonne chanson, opus 61; La chanson d'Ève, opus 95; Le jardin clos, opus 106; Mirages, opus 113; L'horizon chimérique, opus 118; Piano Works: Prelude in G minor opus 103, No. 3; Prelude in E minor opus 103, No. 9; Eleventh Nocturne, opus 104, No.1; Thirteenth Nocturne, opus 119; Chamber Works: Second Violin Sonata, opus 108; First Violoncello Sonata, opus 109; Second Violoncello Sonata, opus 117.
Resumo:
In 1937 Lisa Sergio, "The Golden Voice" of fascist broadcasting from Rome, fled Italy for the United States. Though her mother was American, Sergio was classified as an enemy alien once the United States entered World War II. Yet Sergio became a U.S. citizen in 1944 and built a successful career in radio, working first at NBC and then WQXR in New York City in the days when women's voices were not thought to be appropriate for news or "serious" programming. When she was blacklisted as a communist in the early 1950s, Sergio compensated for the loss of radio employment by becoming principally an author and lecturer in Washington, D.C., until her death in 1989. This dissertation, based on her personal papers, is the first study of Sergio's American mass communication career. It points out the personal, political and social obstacles she faced as a woman in her 52-year career as a commentator on varied aspects of world affairs, religion and feminism. This study includes an examination of the FBI investigations of Sergio and the anti-communist campaigns conducted against her. It concludes that Sergio's success as a public communicator was predicated on both her unusual talents and her ability to transform her public image to reflect ideal American values of womanhood in shifting political climates.
Resumo:
Paul Hindemith has made numerous contributions to the viola, both as a composer and performer. As a composer, he has written 7 sonatas for the viola, as well as a number of chamber and orchestral works which feature the viola as a solo instrument. As a violist, Hindemith was one of the only virtuoso soloists of his lifetime, and premiered virtually all of his solo compositions. Many of his pieces remain an integral part of the viola repertoire; Der Schwanendreher is one of the three major Twentieth-Century concertos for the viola. While some of his pieces are well-known, there are many others which are not performed with much frequency, due in part to the sheer output of this prolific composer. In this dissertation project, I performed Hindemith's compositions for the viola as a solo instrument. Consideration was given to exclusively performing his 4 solo sonatas and 3 sonatas for viola and piano. His only viola duet, his only non-sonata written for viola and piano, and 2 of his viola concertos (Der Schwanendreher and Trauermusik) were included in this dissertation project to provide contrast and supplement the three recital programs. Through this dissertation project I have been able to gain a deeper understanding of the complex language of Hindemith and interpret his music in an approach that is accessible to both the performer and the audience. All performances took place in the Gildenhom Recital Hall and Ulrich Recital Hall at the University of Maryland. All collaborations with piano were performed with Eliza Ching. The Duett for Viola and Violoncello was performed with Daniel Shomper, and the assisting musicians performing in the Trauermusik were Joel Ciaccio, Daniel Sender, Daniel Shomper, Cassandra Stephenson and Dana Weiderhold.