55 resultados para Speech Recognition System using LPC
Resumo:
Military decision makers need to understand and assess the benefits and consequences of their decisions in order to make cost efficient, timely, and successful choices. Technology selection is one such critical decision, especially when considering the design or retrofit of a complex system, such as an aircraft. An integrated and systematic methodology that will support decision-making between technology alternatives and options while assessing the consequences of such decisions is a key enabler. This paper presents and demonstrates, through application to a notional medium range short takeoff and landing (STOL) aircraft, one such enabler: the Technology Impact Forecasting (TIF) method. The goal of the TIF process is to explore both generic, undefined areas of technology, as well as specific technologies, and assess their potential impacts. This is actualized through the development and use of technology scenarios, and allows the designer to determine where to allocate resources for further technology definition and refinement, as well as provide useful design information. The paper particularly discusses the use of technology scenarios and demonstrates their use in the exploration of seven technologies of varying technology readiness levels.
Resumo:
Two case studies are presented in this paper to demonstrate the impact of different power system operation conditions on the power oscillation frequency modes in the Irish power system. A simplified 2 area equivalent of the Irish power system has been used in this paper, where area 1 represents the Republic of Ireland power system and area 2 represents the Northern Ireland power system.
The potential power oscillation frequency modes on the interconnector during different operation conditions have been analysed in this paper. The main objective of this paper is to analyse the influence of different operation conditions involving wind turbine generator (WTG) penetration on power oscillation frequency modes using phasor measurement unit (PMU) data.
Fast Fourier transform (FFT) analysis was performed to identify the frequency oscillation mode while correlation coefficient analysis was used to determine the source of the frequency oscillation. The results show that WTG, particularly fixed speed induction generation (FSIG), gives significant contribution to inter-area power oscillation frequency modes during high WTG operation.
Resumo:
In existing WiFi-based localization methods, smart mobile devices consume quite a lot of power as WiFi interfaces need to be used for frequent AP scanning during the localization process. In this work, we design an energy-efficient indoor localization system called ZigBee assisted indoor localization (ZIL) based on WiFi fingerprints via ZigBee interference signatures. ZIL uses ZigBee interfaces to collect mixed WiFi signals, which include non-periodic WiFi data and periodic beacon signals. However, WiFi APs cannot be identified from these WiFi signals by ZigBee interfaces directly. To address this issue, we propose a method for detecting WiFi APs to form WiFi fingerprints from the signals collected by ZigBee interfaces. We propose a novel fingerprint matching algorithm to align a pair of fingerprints effectively. To improve the localization accuracy, we design the K-nearest neighbor (KNN) method with three different weighted distances and find that the KNN algorithm with the Manhattan distance performs best. Experiments show that ZIL can achieve the localization accuracy of 87%, which is competitive compared to state-of-the-art WiFi fingerprint-based approaches, and save energy by 68% on average compared to the approach based on WiFi interface.
Modelling of Evaporator in Waste Heat Recovery System using Finite Volume Method and Fuzzy Technique
Resumo:
The evaporator is an important component in the Organic Rankine Cycle (ORC)-based Waste Heat Recovery (WHR) system since the effective heat transfer of this device reflects on the efficiency of the system. When the WHR system operates under supercritical conditions, the heat transfer mechanism in the evaporator is unpredictable due to the change of thermo-physical properties of the fluid with temperature. Although the conventional finite volume model can successfully capture those changes in the evaporator of the WHR process, the computation time for this method is high. To reduce the computation time, this paper develops a new fuzzy based evaporator model and compares its performance with the finite volume method. The results show that the fuzzy technique can be applied to predict the output of the supercritical evaporator in the waste heat recovery system and can significantly reduce the required computation time. The proposed model, therefore, has the potential to be used in real time control applications.
Resumo:
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user’s affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.
Resumo:
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.
Resumo:
This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise). The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise. Third, we present an iterative algorithm for improved speech estimates. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. Index Terms: corpus-based speech model, longest matching segment, speech enhancement, speech recognition
Resumo:
Studies have been carried out to recognize individuals from a frontal view using their gait patterns. In previous work, gait sequences were captured using either single or stereo RGB camera systems or the Kinect 1.0 camera system. In this research, we used a new frontal view gait recognition method using a laser based Time of Flight (ToF) camera. In addition to the new gait data set, other contributions include enhancement of the silhouette segmentation, gait cycle estimation and gait image representations. We propose four new gait image representations namely Gait Depth Energy Image (GDE), Partial GDE (PGDE), Discrete Cosine Transform GDE (DGDE) and Partial DGDE (PDGDE). The experimental results show that all the proposed gait image representations produce better accuracy than the previous methods. In addition, we have also developed Fusion GDEs (FGDEs) which achieve better overall accuracy and outperform the previous methods.
Resumo:
A novel methodology is proposed for the development of neural network models for complex engineering systems exhibiting nonlinearity. This method performs neural network modeling by first establishing some fundamental nonlinear functions from a priori engineering knowledge, which are then constructed and coded into appropriate chromosome representations. Given a suitable fitness function, using evolutionary approaches such as genetic algorithms, a population of chromosomes evolves for a certain number of generations to finally produce a neural network model best fitting the system data. The objective is to improve the transparency of the neural networks, i.e. to produce physically meaningful
Resumo:
For the first time in this paper the authors present results showing the effect of out of plane speaker head pose variation on a lip biometric based speaker verification system. Using appearance DCT based features, they adopt a Mutual Information analysis technique to highlight the class discriminant DCT components most robust to changes in out of plane pose. Experiments are conducted using the initial phase of a new multi view Audio-Visual database designed for research and development of pose-invariant speech and speaker recognition. They show that verification performance can be improved by substituting higher order horizontal DCT components for vertical, particularly in the case of a train/test pose angle mismatch.
Resumo:
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
Resumo:
The subjective performance of the G. 722 7-kHz wideband speech-coding recommendation using music signals is described. A number of audible distortions specific to music signals were found to be present in real-time evaluations of the coder. As a result, three modifications are proposed which are found to improve the performance for music signals. These modifications are compatible with the G. 722 system configuration. The results obtained clearly demonstrate the very high coding efficiency of subband ADPCM (adaptive differential pulse-code modulation) with comparison to digitally companding and ADM schemes when applied to music signals.
Resumo:
In the present study, we examined the possible utility of a three-dimensional culture system using a thermo-reversible gelation polymer to isolate and expand neural stem cells (NSCs). The polymer is a synthetic biologically inert polymer and gelates at temperatures higher than the gel-sol transition point ( approximately 20 degrees C). When fetal mouse brain cells were inoculated into the gel, spherical colonies were formed ( approximately 1% in primary culture and approximately 9% in passage cultures). The spheroid-forming cells were positive for expression of the NSC markers nestin and Musashi. Under conditions facilitating spontaneous neural differentiation, the spheroid-forming cells expressed genes characteristic to astrocytes, oligodendrocytes, and neurons. The cells could be successively propagated at least to 80 poly-D-lysines over a period of 20 weeks in the gel culture with a growth rate higher than that observed in suspension culture. The spheroids formed by fetal mouse brain cells in the gel were shown to be of clonal origin. These results indicate that the spheroid culture system is a convenient and powerful tool for isolation and clonal expansion of NSCs in vitro.