972 resultados para Spectral resolution


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis deals with the problem of the instantaneous frequency (IF) estimation of sinusoidal signals. This topic plays significant role in signal processing and communications. Depending on the type of the signal, two major approaches are considered. For IF estimation of single-tone or digitally-modulated sinusoidal signals (like frequency shift keying signals) the approach of digital phase-locked loops (DPLLs) is considered, and this is Part-I of this thesis. For FM signals the approach of time-frequency analysis is considered, and this is Part-II of the thesis. In part-I we have utilized sinusoidal DPLLs with non-uniform sampling scheme as this type is widely used in communication systems. The digital tanlock loop (DTL) has introduced significant advantages over other existing DPLLs. In the last 10 years many efforts have been made to improve DTL performance. However, this loop and all of its modifications utilizes Hilbert transformer (HT) to produce a signal-independent 90-degree phase-shifted version of the input signal. Hilbert transformer can be realized approximately using a finite impulse response (FIR) digital filter. This realization introduces further complexity in the loop in addition to approximations and frequency limitations on the input signal. We have tried to avoid practical difficulties associated with the conventional tanlock scheme while keeping its advantages. A time-delay is utilized in the tanlock scheme of DTL to produce a signal-dependent phase shift. This gave rise to the time-delay digital tanlock loop (TDTL). Fixed point theorems are used to analyze the behavior of the new loop. As such TDTL combines the two major approaches in DPLLs: the non-linear approach of sinusoidal DPLL based on fixed point analysis, and the linear tanlock approach based on the arctan phase detection. TDTL preserves the main advantages of the DTL despite its reduced structure. An application of TDTL in FSK demodulation is also considered. This idea of replacing HT by a time-delay may be of interest in other signal processing systems. Hence we have analyzed and compared the behaviors of the HT and the time-delay in the presence of additive Gaussian noise. Based on the above analysis, the behavior of the first and second-order TDTLs has been analyzed in additive Gaussian noise. Since DPLLs need time for locking, they are normally not efficient in tracking the continuously changing frequencies of non-stationary signals, i.e. signals with time-varying spectra. Nonstationary signals are of importance in synthetic and real life applications. An example is the frequency-modulated (FM) signals widely used in communication systems. Part-II of this thesis is dedicated for the IF estimation of non-stationary signals. For such signals the classical spectral techniques break down, due to the time-varying nature of their spectra, and more advanced techniques should be utilized. For the purpose of instantaneous frequency estimation of non-stationary signals there are two major approaches: parametric and non-parametric. We chose the non-parametric approach which is based on time-frequency analysis. This approach is computationally less expensive and more effective in dealing with multicomponent signals, which are the main aim of this part of the thesis. A time-frequency distribution (TFD) of a signal is a two-dimensional transformation of the signal to the time-frequency domain. Multicomponent signals can be identified by multiple energy peaks in the time-frequency domain. Many real life and synthetic signals are of multicomponent nature and there is little in the literature concerning IF estimation of such signals. This is why we have concentrated on multicomponent signals in Part-H. An adaptive algorithm for IF estimation using the quadratic time-frequency distributions has been analyzed. A class of time-frequency distributions that are more suitable for this purpose has been proposed. The kernels of this class are time-only or one-dimensional, rather than the time-lag (two-dimensional) kernels. Hence this class has been named as the T -class. If the parameters of these TFDs are properly chosen, they are more efficient than the existing fixed-kernel TFDs in terms of resolution (energy concentration around the IF) and artifacts reduction. The T-distributions has been used in the IF adaptive algorithm and proved to be efficient in tracking rapidly changing frequencies. They also enables direct amplitude estimation for the components of a multicomponent

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of radar was developed for the estimation of the distance (range) and velocity of a target from a receiver. The distance measurement is obtained by measuring the time taken for the transmitted signal to propagate to the target and return to the receiver. The target's velocity is determined by measuring the Doppler induced frequency shift of the returned signal caused by the rate of change of the time- delay from the target. As researchers further developed conventional radar systems it become apparent that additional information was contained in the backscattered signal and that this information could in fact be used to describe the shape of the target itself. It is due to the fact that a target can be considered to be a collection of individual point scatterers, each of which has its own velocity and time- delay. DelayDoppler parameter estimation of each of these point scatterers thus corresponds to a mapping of the target's range and cross range, thus producing an image of the target. Much research has been done in this area since the early radar imaging work of the 1960s. At present there are two main categories into which radar imaging falls. The first of these is related to the case where the backscattered signal is considered to be deterministic. The second is related to the case where the backscattered signal is of a stochastic nature. In both cases the information which describes the target's scattering function is extracted by the use of the ambiguity function, a function which correlates the backscattered signal in time and frequency with the transmitted signal. In practical situations, it is often necessary to have the transmitter and the receiver of the radar system sited at different locations. The problem in these situations is 'that a reference signal must then be present in order to calculate the ambiguity function. This causes an additional problem in that detailed phase information about the transmitted signal is then required at the receiver. It is this latter problem which has led to the investigation of radar imaging using time- frequency distributions. As will be shown in this thesis, the phase information about the transmitted signal can be extracted from the backscattered signal using time- frequency distributions. The principle aim of this thesis was in the development, and subsequent discussion into the theory of radar imaging, using time- frequency distributions. Consideration is first given to the case where the target is diffuse, ie. where the backscattered signal has temporal stationarity and a spatially white power spectral density. The complementary situation is also investigated, ie. where the target is no longer diffuse, but some degree of correlation exists between the time- frequency points. Computer simulations are presented to demonstrate the concepts and theories developed in the thesis. For the proposed radar system to be practically realisable, both the time- frequency distributions and the associated algorithms developed must be able to be implemented in a timely manner. For this reason an optical architecture is proposed. This architecture is specifically designed to obtain the required time and frequency resolution when using laser radar imaging. The complex light amplitude distributions produced by this architecture have been computer simulated using an optical compiler.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fourier transfonn (FT) Raman, Raman microspectroscopy and Fourier transform infrared (FTIR) spectroscopy have been used for the structural analysis and characterisation of untreated and chemically treated wool fibres. For FT -Raman spectroscopy novel methods of sample presentation have been developed and optimised for the analysis of wool. No significant fluorescence was observed and the spectra could be obtained routinely. The stability of wool keratin to the laser source was investigated and the visual and spectroscopic signs of sample damage were established. Wool keratin was found to be extremely robust with no signs of sample degradation observed for laser powers of up to 600 m W and for exposure times of up to seven and half hours. Due to improvements in band resolution and signal-to-noise ratio, several previously unobserved spectral features have become apparent. The assignment of the Raman active vibrational modes of wool have been reviewed and updated to include these features. The infrared spectroscopic techniques of attenuated total reflectance (ATR) and photoacoustic (P A) have been used to examine shrinkproofed and mothproofed wool samples. Shrinkproofing is an oxidative chemical treatment used to selectively modifY the surface of a wool fibre. Mothproofing is a chemical treatment applied to wool for the prevention of insect attack. The ability of PAS and A TR to vary the penetration depth by varying certain instrumental parameters was used to obtain spectra of the near surface regions of these chemically treated samples. These spectra were compared with those taken with a greater penetration depth, which therefore represent more of the bulk wool sample. The PA and ATR spectra demonstrated that oxidation was restricted to the near-surface layer of wool. Extensive curve fitting of ATR spectra of untreated wool indicated that cuticle was composed of a mixed protein conformation, but was predominately that of an a.-helix. The cortex was proposed to be a mixture of both a.helical and ~-pleated sheet protein conformations. These findings were supported by PAS depth profiling results. Raman microspectroscopy was used in an extensive investigation of the molecular structure of the wool fibre. This included determining the orientation of certain functional groups within the wool fibre and the symmetry of particular vibrations. The orientation ofbonds within the wool fibre was investigated by orientating the wool fibre axis parallel and then perpendicular to the plane of polarisation of the electric vector of the incident radiation. It was experimentally determined that the majority of C=O and N-H bonds of the peptide bond of wool lie parallel to the fibre axis. Additionally, a number of the important vibrations associated with the a-helix were also found to lie parallel to the fibre axis. Further investigation into the molecular structure of wool involved determining what effect stretching the wool fibre had on bond orientation. Raman spectra of stretched and unstretched wool fibres indicated that extension altered the orientation ofthe aromatic rings, the CH2 and CH3 groups of the amino acids. Curve fitting results revealed that extension resulted in significant destruction of the a-helix structure a substantial increase in the P-pleated sheet structure. Finally, depolarisation ratios were calculated for Raman spectra. The vibrations associated with the aromatic rings of amino acids had very low ratios which indicated that the vibrations were highly symmetrical.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper suggests an approach for finding an appropriate combination of various parameters for extracting texture features (e.g. choice of spectral band for extracting texture feature, size of the moving window, quantization level of the image, and choice of texture feature etc.) to be used in the classification process. Gray level co-occurrence matrix (GLCM) method has been used for extracting texture from remotely sensed satellite image. Results of the classification of an Indian urban environment using spatial property (texture), derived from spectral and multi-resolution wavelet decomposed images have also been reported. A multivariate data analysis technique called ‘conjoint analysis’ has been used in the study to analyze the relative importance of these parameters. Results indicate that the choice of texture feature and window size have higher relative importance in the classification process than quantization level or the choice of image band for extracting texture feature. In case of texture features derived using wavelet decomposed image, the parameter ‘decomposition level’ has almost equal relative importance as the size of moving window and the decomposition of images up to level one is sufficient and there is no need to go for further decomposition. It was also observed that the classification incorporating texture features improves the overall classification accuracy in a statistically significant manner in comparison to pure spectral classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Uncooperative iris identification systems at a distance and on the move often suffer from poor resolution and poor focus of the captured iris images. The lack of pixel resolution and well-focused images significantly degrades the iris recognition performance. This paper proposes a new approach to incorporate the focus score into a reconstruction-based super-resolution process to generate a high resolution iris image from a low resolution and focus inconsistent video sequence of an eye. A reconstruction-based technique, which can incorporate middle and high frequency components from multiple low resolution frames into one desired super-resolved frame without introducing false high frequency components, is used. A new focus assessment approach is proposed for uncooperative iris at a distance and on the move to improve performance for variations in lighting, size and occlusion. A novel fusion scheme is then proposed to incorporate the proposed focus score into the super-resolution process. The experiments conducted on the The Multiple Biometric Grand Challenge portal database shows that our proposed approach achieves an EER of 2.1%, outperforming the existing state-of-the-art averaging signal-level fusion approach by 19.2% and the robust mean super-resolution approach by 8.7%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Open access reforms to railway regulations allow multiple train operators to provide rail services on a common infrastructure. As railway operations are now independently managed by different stakeholders, conflicts in operations may arise, and there have been attempts to derive an effective access charge regime so that these conflicts may be resolved. One approach is by direct negotiation between the infrastructure manager and the train service providers. Despite the substantial literature on the topic, few consider the benefits of employing computer simulation as an evaluation tool for railway operational activities such as access pricing. This article proposes a multi-agent system (MAS) framework for the railway open market and demonstrates its feasibility by modelling the negotiation between an infrastructure provider and a train service operator. Empirical results show that the model is capable of resolving operational conflicts according to market demand.