88 resultados para Speech in Noise


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, a refined classic noise prediction method based on the VISSIM and FHWA noise prediction model is formulated to analyze the sound level contributed by traffic on the Nanjing Lukou airport connecting freeway before and after widening. The aim of this research is to (i) assess the traffic noise impact on the Nanjing University of Aeronautics and Astronautics (NUAA) campus before and after freeway widening, (ii) compare the prediction results with field data to test the accuracy of this method, (iii) analyze the relationship between traffic characteristics and sound level. The results indicate that the mean difference between model predictions and field measurements is acceptable. The traffic composition impact study indicates that buses (including mid-sizedtrucks) and heavy goods vehicles contribute a significant proportion of total noise power despite their low traffic volume. In addition, speed analysis offers an explanation for the minor differences in noise level across time periods. Future work will aim at reducing model error, by focusing on noise barrier analysis using the FEM/BEM method and modifying the vehicle noise emission equation by conducting field experimentation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The support for typically out-of-vocabulary query terms such as names, acronyms, and foreign words is an important requirement of many speech indexing applications. However, to date many unrestricted vocabulary indexing systems have struggled to provide a balance between good detection rate and fast query speeds. This paper presents a fast and accurate unrestricted vocabulary speech indexing technique named Dynamic Match Lattice Spotting (DMLS). The proposed method augments the conventional lattice spotting technique with dynamic sequence matching, together with a number of other novel algorithmic enhancements, to obtain a system that is capable of searching hours of speech in seconds while maintaining excellent detection performance

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The nature of the transport system contributes to public health outcomes in a range of ways. The clearest contribution to public health is in the area of traffic crashes, because of their direct impact on individual death and disability and their direct costs to the health system. Other papers in this conference address these issues. This paper outlines some collaborative research between the Centre for Accident Research and Road Safety - Queensland (CARRS-Q) at QUT and Chinese researchers in areas that have indirect health impacts. Heavy vehicle dynamics: The integrity of the road surface influences crash risk, with ruts, pot-holes and other forms of road damage contributing to increased crash risks. The great majority of damage to the road surface from vehicles is caused by heavy trucks and buses, rather than cars or smaller vehicles. In some cases this damage is due to deliberate overloading, but in other cases it is due to vehicle suspension characteristics that lead to occasional high loads on particular wheels. Together with a visiting researcher and his colleagues, we have used both Queensland and Chinese data to model vehicle suspension systems that reduce the level of load, and hence the level of road damage and resulting crash risk(1-5). Toll worker exposure to vehicle emissions: The increasing construction of highways in China has also involved construction of a large number of toll roads. Tollbooth workers are potentially exposed to high levels of pollutants from vehicles, however the extent of this exposure and how it relates to standards for exposure are not well known. In a study led by a visiting researcher, we conducted a study to model these levels of exposure for a tollbooth in China(6). Noise pollution: The increasing presence of high speed roads in China has contributed to an increase in noise levels. In this collaborative study we modelled noise levels associated with a freeway widening near a university campus, and measures to reduce the noise(7). Along with these areas of research, there are many other areas of transport with health implications that are worthy of exploration. Traffic, noise and pollution contribute to a difficult environment for pedestrians, especially in an ageing society where there are health benefits to increasing physical activity. By building on collaborations such as those outlined, there is potential for a contribution to improved public health by addressing transport issues such as vehicle factors and pollution, and extending the research to other areas of travel activity. 1. Chen, Y., He, J., King, M., Chen, W. and Zhang, W. (2014). Stiffness-damping matching method of an ECAS system based on LQG control. Journal of Central South University, 21:439-446. DOI: 10.1007/s1177101419579 2. Chen, Y., He, J., King, M., Feng, Z. and Chang, W. (2013). Comparison of two suspension control strategies for multi-axle heavy truck. Journal of Central South University, 20(2): 550-562. 3. Chen, Y., He, J., King, M., Chen, W. and Zhang, W. (2013). Effect of driving conditions and suspension parameters on dynamic load-sharing of longitudinal-connected air suspensions. Science China Technological Sciences, 56(3): 666-676. DOI: 10.1007/s11431-012-5091-3 4. Chen, Y., He., J., King, M., Chen, W. and Zhang, W. (2013). Model development and dynamic load-sharing analysis of longitudinal-connected air suspensions. Strojniški Vestnik - Journal of Mechanical Engineering, 59(1):14-24. 5. Chen, Y., He, J., King, M., Liu, H. and Zhang, W. (2013). Dynamic load-sharing of longitudinal-connected air suspensions of a tri-axle semi-trailer. Proceedings of Transportation Research Board Annual Conference, Washington DC, 13-17 January 2013, paper no. 13-1117. 6. He, J., Qi, Z., Hang, W., King, M., and Zhao, C. (2011). Numerical evaluation of pollutant dispersion at a toll plaza based on system dynamics and Computational Fluid Dynamics models. Transportation Research Part C, 19(2011):510-520. 7. Zhang, C., He, J., Wang, Z., Yin, R. and King, M. (2013). Assessment of traffic noise level before and after freeway widening using traffic microsimulation and a refined classic noise prediction method. Proceedings of Transportation Research Board Annual Conference, Washington DC, 13-17 January 2013, paper no. 13-2016.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, a refined classic noise prediction method based on the VISSIM and FHWA noise prediction model is formulated to analyze the sound level contributed by traffic on the Nanjing Lukou airport connecting freeway before and after widening. The aim of this research is to (i) assess the traffic noise impact on the Nanjing University of Aeronautics and Astronautics (NUAA) campus before and after freeway widening, (ii) compare the prediction results with field data to test the accuracy of this method, (iii) analyze the relationship between traffic characteristics and sound level. The results indicate that the mean difference between model predictions and field measurements is acceptable. The traffic composition impact study indicates that buses (including mid-sized trucks) and heavy goods vehicles contribute a significant proportion of total noise power despite their low traffic volume. In addition, speed analysis offers an explanation for the minor differences in noise level across time periods. Future work will aim at reducing model error, by focusing on noise barrier analysis using the FEM/BEM method and modifying the vehicle noise emission equation by conducting field experimentation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The performance of an adaptive filter may be studied through the behaviour of the optimal and adaptive coefficients in a given environment. This thesis investigates the performance of finite impulse response adaptive lattice filters for two classes of input signals: (a) frequency modulated signals with polynomial phases of order p in complex Gaussian white noise (as nonstationary signals), and (b) the impulsive autoregressive processes with alpha-stable distributions (as non-Gaussian signals). Initially, an overview is given for linear prediction and adaptive filtering. The convergence and tracking properties of the stochastic gradient algorithms are discussed for stationary and nonstationary input signals. It is explained that the stochastic gradient lattice algorithm has many advantages over the least-mean square algorithm. Some of these advantages are having a modular structure, easy-guaranteed stability, less sensitivity to the eigenvalue spread of the input autocorrelation matrix, and easy quantization of filter coefficients (normally called reflection coefficients). We then characterize the performance of the stochastic gradient lattice algorithm for the frequency modulated signals through the optimal and adaptive lattice reflection coefficients. This is a difficult task due to the nonlinear dependence of the adaptive reflection coefficients on the preceding stages and the input signal. To ease the derivations, we assume that reflection coefficients of each stage are independent of the inputs to that stage. Then the optimal lattice filter is derived for the frequency modulated signals. This is performed by computing the optimal values of residual errors, reflection coefficients, and recovery errors. Next, we show the tracking behaviour of adaptive reflection coefficients for frequency modulated signals. This is carried out by computing the tracking model of these coefficients for the stochastic gradient lattice algorithm in average. The second-order convergence of the adaptive coefficients is investigated by modeling the theoretical asymptotic variance of the gradient noise at each stage. The accuracy of the analytical results is verified by computer simulations. Using the previous analytical results, we show a new property, the polynomial order reducing property of adaptive lattice filters. This property may be used to reduce the order of the polynomial phase of input frequency modulated signals. Considering two examples, we show how this property may be used in processing frequency modulated signals. In the first example, a detection procedure in carried out on a frequency modulated signal with a second-order polynomial phase in complex Gaussian white noise. We showed that using this technique a better probability of detection is obtained for the reduced-order phase signals compared to that of the traditional energy detector. Also, it is empirically shown that the distribution of the gradient noise in the first adaptive reflection coefficients approximates the Gaussian law. In the second example, the instantaneous frequency of the same observed signal is estimated. We show that by using this technique a lower mean square error is achieved for the estimated frequencies at high signal-to-noise ratios in comparison to that of the adaptive line enhancer. The performance of adaptive lattice filters is then investigated for the second type of input signals, i.e., impulsive autoregressive processes with alpha-stable distributions . The concept of alpha-stable distributions is first introduced. We discuss that the stochastic gradient algorithm which performs desirable results for finite variance input signals (like frequency modulated signals in noise) does not perform a fast convergence for infinite variance stable processes (due to using the minimum mean-square error criterion). To deal with such problems, the concept of minimum dispersion criterion, fractional lower order moments, and recently-developed algorithms for stable processes are introduced. We then study the possibility of using the lattice structure for impulsive stable processes. Accordingly, two new algorithms including the least-mean P-norm lattice algorithm and its normalized version are proposed for lattice filters based on the fractional lower order moments. Simulation results show that using the proposed algorithms, faster convergence speeds are achieved for parameters estimation of autoregressive stable processes with low to moderate degrees of impulsiveness in comparison to many other algorithms. Also, we discuss the effect of impulsiveness of stable processes on generating some misalignment between the estimated parameters and the true values. Due to the infinite variance of stable processes, the performance of the proposed algorithms is only investigated using extensive computer simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Signal Processing (SP) is a subject of central importance in engineering and the applied sciences. Signals are information-bearing functions, and SP deals with the analysis and processing of signals (by dedicated systems) to extract or modify information. Signal processing is necessary because signals normally contain information that is not readily usable or understandable, or which might be disturbed by unwanted sources such as noise. Although many signals are non-electrical, it is common to convert them into electrical signals for processing. Most natural signals (such as acoustic and biomedical signals) are continuous functions of time, with these signals being referred to as analog signals. Prior to the onset of digital computers, Analog Signal Processing (ASP) and analog systems were the only tool to deal with analog signals. Although ASP and analog systems are still widely used, Digital Signal Processing (DSP) and digital systems are attracting more attention, due in large part to the significant advantages of digital systems over the analog counterparts. These advantages include superiority in performance,s peed, reliability, efficiency of storage, size and cost. In addition, DSP can solve problems that cannot be solved using ASP, like the spectral analysis of multicomonent signals, adaptive filtering, and operations at very low frequencies. Following the recent developments in engineering which occurred in the 1980's and 1990's, DSP became one of the world's fastest growing industries. Since that time DSP has not only impacted on traditional areas of electrical engineering, but has had far reaching effects on other domains that deal with information such as economics, meteorology, seismology, bioengineering, oceanology, communications, astronomy, radar engineering, control engineering and various other applications. This book is based on the Lecture Notes of Associate Professor Zahir M. Hussain at RMIT University (Melbourne, 2001-2009), the research of Dr. Amin Z. Sadik (at QUT & RMIT, 2005-2008), and the Note of Professor Peter O'Shea at Queensland University of Technology. Part I of the book addresses the representation of analog and digital signals and systems in the time domain and in the frequency domain. The core topics covered are convolution, transforms (Fourier, Laplace, Z. Discrete-time Fourier, and Discrete Fourier), filters, and random signal analysis. There is also a treatment of some important applications of DSP, including signal detection in noise, radar range estimation, banking and financial applications, and audio effects production. Design and implementation of digital systems (such as integrators, differentiators, resonators and oscillators are also considered, along with the design of conventional digital filters. Part I is suitable for an elementary course in DSP. Part II (which is suitable for an advanced signal processing course), considers selected signal processing systems and techniques. Core topics covered are the Hilbert transformer, binary signal transmission, phase-locked loops, sigma-delta modulation, noise shaping, quantization, adaptive filters, and non-stationary signal analysis. Part III presents some selected advanced DSP topics. We hope that this book will contribute to the advancement of engineering education and that it will serve as a general reference book on digital signal processing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The "AIDS Vaccine 2008" Conference was held in Cape Town, South Africa (October 13 to 16, 2008) and organized, under the aegis of the Global HIV Vaccine Enterprise, by Dr. Lynn Morris (Chair of the Conference) National Institute of Communicable Diseases; Dr. Koleka Mlisana from CAPRISA, University KwaZulu-Natal, Durban, Dr. Glenda Gray from Perinatal HIV Research Unit, University Witwatersrand, Johannesburg and Dr. Carolyn Williamson from Institute of Infectious Diseses. and Molecular Medicine, UCT, Cape Town (Co-Chairs of the Conference). Since the first AIDS Vaccine conference, organized in Paris in 2000, this was the first time it was held outside of the U.S. and Europe, and involved nearly 1,000 participants. Besides three Plenary Sessions with ten state-of-the-art plenary lectures and one Keynote Lecture given by Dr. A.S. Fauci (Director of NIAID, NIH, USA), the Conference was organized in nine oral sessions, four poster discussion groups covering a wide spectrum of scientific information relating to HIV vaccine research and development. Moreover three Symposia, two Special Sessions, one Roundtable as well as two Debates were held, the latter focusing on current controversial topics. The conference opening was memorable for a number of reasons: among these was the presence of South Africa's new Minister of Health, Barbara Hogan who, in her first speech in a major forum as a senior member of the SA Government, affirmed that HIV causes AIDS, and that the search for a vaccine is of paramount importance to SA and the rest of the world. A scientific summary of the Conference is reported in the present article, divided into four major topics: (1) vaccine concepts and design; (2) T-cell immunology and innate immunity; (3) B-cell immunology, neutralizing antibodies and mucosal immunology; and (4) clinical trials. © 2009 Landes Bioscience.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper demonstrates, following Vygotsky, that language and tool use has a critical role in the collaborative problem-solving behaviour of school-age children. It reports original ethnographic classroom research examining the convergence of speech and practical activity in children’s collaborative problem solving with robotics programming tasks. The researchers analysed children’s interactions during a series of problem solving experiments in which Lego Mindstorms toolsets were used by teachers to create robotics design challenges among 24 students in a Year 4 Australian classroom (students aged 8.5–9.5 years). The design challenges were incrementally difficult, beginning with basic programming of straight line movement, and progressing to more complex challenges involving programming of the robots to raise Lego figures from conduit pipes using robots as pulleys with string and recycled materials. Data collection involved micro-genetic analysis of students’ speech interactions with tools, peers, and other experts, teacher interviews, and student focus group data. Coding the repeated patterns in the transcripts, the authors outline the structure of the children’s social speech in joint problem solving, demonstrating the patterns of speech and interaction that play an important role in the socialisation of the school-age child’s practical intellect.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In late 2010, the online nonprofit media organization WikiLeaks published classified documents detailing correspondence between the U.S. State Department and its diplomatic missions around the world, numbering around 250,000 cables. These diplomatic cables contained classified information with comments on world leaders, foreign states, and various international and domestic issues. Negative reactions to the publication of these cables came from both the U.S. political class (which was generally condemnatory of WikiLeaks, invoking national security concerns and the jeopardizing of U.S. interests abroad) and the corporate world, with various companies ceasing to continue to provide services to WikiLeaks despite no legal measure (e.g., a court injunction) forcing them to do so. This article focuses on the legal remedies available to WikiLeaks against this corporate suppression of its speech in the U.S. and Europe since these are the two principle arenas in which the actors concerned are operating. The transatlantic legal protection of free expression will be considered, yet, as will be explained in greater detail, the legal conception of this constitutional and fundamental right comes from a time when the state posed the greater threat to freedom. As a result, it is not generally enforceable against private, non-state entities interfering with speech and expression which is the case here. Other areas of law, namely antitrust/competition, contract and tort will then be examined to determine whether WikiLeaks and its partners can attempt to enforce their right indirectly through these other means. Finally, there will be some concluding thoughts about the implications of the corporate response to the WikiLeaks embassy cables leak for freedom of expression online.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This correspondence presents a microphone array shape calibration procedure for diffuse noise environments. The procedure estimates intermicrophone distances by fitting the measured noise coherence with its theoretical model and then estimates the array geometry using classical multidimensional scaling. The technique is validated on noise recordings from two office environments.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.