410 resultados para PROCESSING TECHNIQUE


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel combinatory orthogonal frequency division multiplexing (PC-OFDM yields lower maximum peak-to-average power ratio (PAR), high bandwidth efficiency and lower bit error rate (BER) on Gaussian channels compared to OFDM systems. However, PC-OFDM does not improve the statistics of PAR significantly. In this chapter, the use of a set of fixed permutations to improve the statistics of the PAR of a PC-OFDM signal is presented. For this technique, interleavers are used to produce K-1 permuted sequences from the same information sequence. The sequence with the lowest PAR, among K sequences is chosen for the transmission. The PAR of a PC-OFDM signal can be further reduced by 3-4 dB by this technique. Mathematical expressions for the complementary cumulative density function (CCDF)of PAR of PC-OFDM signal and interleaved PC-OFDM signal are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the fouling characteristics of four tubular ceramic membranes with pore sizes 300 kDa, 0.1 μm and 0.45 μm installed in a pilot plant at a sugar factory for processing clarified cane sugar juices. All the membranes, except the one with a pore size of 0.45 μm, generally gave reproducible results through the trials, were easy to clean and could handle operation at high volumetric concentration factors. Analysis of fouled and cleaned ceramic membranes revealed that polysaccharides, lipids and to a lesser extent, polyphenols, as well as other colloidal particles cause fouling of the membranes. Electrostatic and hydrophobic forces cause strong aggregation of the polymeric components with one another and with colloidal particles. To combat irreversible fouling of the membranes, treatment options that result in the removal of particles having a size range of 0.2–0.5 μm and in addition remove polymeric impurities, need to be identified. Chemical and microscopic evaluations of the juices and the structural characterisation of individual particles and aggregates identified options to mitigate the fouling of membranes. These include conditioning the feed prior to membrane filtration to break up the network structure formed between the polymers and particles in the feed and the use of surfactants to prevent the aggregation of polymers and particles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of sTD accuracy, making it an ideal candiate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phe classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a non-linear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ISSCT Process Section workshop held in Réunion 20–23 October 2008 was attended by 51 delegates from 10 countries. The theme was Green cane impact on sugar processing. The workshop provided a valuable and timely opportunity to review and discuss the impact on factory operations and performance from a green cane supply that could include significant levels of trash. It was particularly relevant to those mills that were considering options to boost their biomass intake for increased co-generation capacity. Several of the speakers related their experiences with processing ‘whole of crop’ cane supplies through the factory. Speakers detailed the problems and increased losses that were incurred when processing cane with high trash levels. The consensus of the delegates was that the best scenario would involve a cane-cleaning plant at the factory so that only clean cane would be processed through the factory. The forum recommended that more research was required to address the issues of increased impurities in the process streams associated with high trash levels. Site visits to the two factories and a cane-delivery station were arranged as part of the workshop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Malnutrition is common among dialysis patients and is associated with an adverse outcome. One cause of this is a persistent reduction in nutrient intake, suggesting an abnormality of appetite regulation. Methods We used a novel technique to describe the appetite profile in 46 haemodialysis (HD) patients and 40 healthy controls. The Electronic Appetite Rating System (EARS) employs a palmtop computer to collect hourly ratings of motivation to eat and mood. We collected data on hunger, desire to eat, fullness, and tiredness. HD subjects were monitored on the dialysis day and the interdialytic day. Controls were monitored for 1 or 2 days. Results Temporal profiles of motivation to eat for the controls were similar on both days. Temporal profiles of motivation to eat for the HD group were lower on the dialysis day. Mean HD scores were not significantly different from controls. Dietary records indicated that dialysis patients consumed less food than controls. Conclusions Our data indicate that the EARS can be used to monitor subjective appetite states continuously in a group of HD patients. A HD session reduces hunger and desire to eat. Patients feel more tired after dialysis. This does not correlate with their hunger score, but does correlate with their fullness rating. Nutrient intake is reduced, suggesting a resetting of appetite control for the HD group. The EARS may be useful for intervention studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To date, studies have focused on the acquisition of alphabetic second languages (L2s) in alphabetic first language (L1) users, demonstrating significant transfer effects. The present study examined the process from a reverse perspective, comparing logographic (Mandarin-Chinese) and alphabetic (English) L1 users in the acquisition of an artificial logographic script, in order to determine whether similar language-specific advantageous transfer effects occurred. English monolinguals, English-French bilinguals and Chinese-English bilinguals learned a small set of symbols in an artificial logographic script and were subsequently tested on their ability to process this script in regard to three main perspectives: L2 reading, L2 working memory (WM), and inner processing strategies. In terms of L2 reading, a lexical decision task on the artificial symbols revealed markedly faster response times in the Chinese-English bilinguals, indicating a logographic transfer effect suggestive of a visual processing advantage. A syntactic decision task evaluated the degree to which the new language was mastered beyond the single word level. No L1-specific transfer effects were found for artificial language strings. In order to investigate visual processing of the artificial logographs further, a series of WM experiments were conducted. Artificial logographs were recalled under concurrent auditory and visuo-spatial suppression conditions to disrupt phonological and visual processing, respectively. No L1-specific transfer effects were found, indicating no visual processing advantage of the Chinese-English bilinguals. However, a bilingual processing advantage was found indicative of a superior ability to control executive functions. In terms of L1 WM, the Chinese-English bilinguals outperformed the alphabetic L1 users when processing L1 words, indicating a language experience-specific advantage. Questionnaire data on the cognitive strategies that were deployed during the acquisition and processing of the artificial logographic script revealed that the Chinese-English bilinguals rated their inner speech as lower than the alphabetic L1 users, suggesting that they were transferring their phonological processing skill set to the acquisition and use of an artificial script. Overall, evidence was found to indicate that language learners transfer specific L1 orthographic processing skills to L2 logographic processing. Additionally, evidence was also found indicating that a bilingual history enhances cognitive performance in L2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

World economies increasingly demand reliable and economical power supply and distribution. To achieve this aim the majority of power systems are becoming interconnected, with several power utilities supplying the one large network. One problem that occurs in a large interconnected power system is the regular occurrence of system disturbances which can result in the creation of intra-area oscillating modes. These modes can be regarded as the transient responses of the power system to excitation, which are generally characterised as decaying sinusoids. For a power system operating ideally these transient responses would ideally would have a “ring-down” time of 10-15 seconds. Sometimes equipment failures disturb the ideal operation of power systems and oscillating modes with ring-down times greater than 15 seconds arise. The larger settling times associated with such “poorly damped” modes cause substantial power flows between generation nodes, resulting in significant physical stresses on the power distribution system. If these modes are not just poorly damped but “negatively damped”, catastrophic failures of the system can occur. To ensure system stability and security of large power systems, the potentially dangerous oscillating modes generated from disturbances (such as equipment failure) must be quickly identified. The power utility must then apply appropriate damping control strategies. In power system monitoring there exist two facets of critical interest. The first is the estimation of modal parameters for a power system in normal, stable, operation. The second is the rapid detection of any substantial changes to this normal, stable operation (because of equipment breakdown for example). Most work to date has concentrated on the first of these two facets, i.e. on modal parameter estimation. Numerous modal parameter estimation techniques have been proposed and implemented, but all have limitations [1-13]. One of the key limitations of all existing parameter estimation methods is the fact that they require very long data records to provide accurate parameter estimates. This is a particularly significant problem after a sudden detrimental change in damping. One simply cannot afford to wait long enough to collect the large amounts of data required for existing parameter estimators. Motivated by this gap in the current body of knowledge and practice, the research reported in this thesis focuses heavily on rapid detection of changes (i.e. on the second facet mentioned above). This thesis reports on a number of new algorithms which can rapidly flag whether or not there has been a detrimental change to a stable operating system. It will be seen that the new algorithms enable sudden modal changes to be detected within quite short time frames (typically about 1 minute), using data from power systems in normal operation. The new methods reported in this thesis are summarised below. The Energy Based Detector (EBD): The rationale for this method is that the modal disturbance energy is greater for lightly damped modes than it is for heavily damped modes (because the latter decay more rapidly). Sudden changes in modal energy, then, imply sudden changes in modal damping. Because the method relies on data from power systems in normal operation, the modal disturbances are random. Accordingly, the disturbance energy is modelled as a random process (with the parameters of the model being determined from the power system under consideration). A threshold is then set based on the statistical model. The energy method is very simple to implement and is computationally efficient. It is, however, only able to determine whether or not a sudden modal deterioration has occurred; it cannot identify which mode has deteriorated. For this reason the method is particularly well suited to smaller interconnected power systems that involve only a single mode. Optimal Individual Mode Detector (OIMD): As discussed in the previous paragraph, the energy detector can only determine whether or not a change has occurred; it cannot flag which mode is responsible for the deterioration. The OIMD seeks to address this shortcoming. It uses optimal detection theory to test for sudden changes in individual modes. In practice, one can have an OIMD operating for all modes within a system, so that changes in any of the modes can be detected. Like the energy detector, the OIMD is based on a statistical model and a subsequently derived threshold test. The Kalman Innovation Detector (KID): This detector is an alternative to the OIMD. Unlike the OIMD, however, it does not explicitly monitor individual modes. Rather it relies on a key property of a Kalman filter, namely that the Kalman innovation (the difference between the estimated and observed outputs) is white as long as the Kalman filter model is valid. A Kalman filter model is set to represent a particular power system. If some event in the power system (such as equipment failure) causes a sudden change to the power system, the Kalman model will no longer be valid and the innovation will no longer be white. Furthermore, if there is a detrimental system change, the innovation spectrum will display strong peaks in the spectrum at frequency locations associated with changes. Hence the innovation spectrum can be monitored to both set-off an “alarm” when a change occurs and to identify which modal frequency has given rise to the change. The threshold for alarming is based on the simple Chi-Squared PDF for a normalised white noise spectrum [14, 15]. While the method can identify the mode which has deteriorated, it does not necessarily indicate whether there has been a frequency or damping change. The PPM discussed next can monitor frequency changes and so can provide some discrimination in this regard. The Polynomial Phase Method (PPM): In [16] the cubic phase (CP) function was introduced as a tool for revealing frequency related spectral changes. This thesis extends the cubic phase function to a generalised class of polynomial phase functions which can reveal frequency related spectral changes in power systems. A statistical analysis of the technique is performed. When applied to power system analysis, the PPM can provide knowledge of sudden shifts in frequency through both the new frequency estimate and the polynomial phase coefficient information. This knowledge can be then cross-referenced with other detection methods to provide improved detection benchmarks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of an adaptive filter may be studied through the behaviour of the optimal and adaptive coefficients in a given environment. This thesis investigates the performance of finite impulse response adaptive lattice filters for two classes of input signals: (a) frequency modulated signals with polynomial phases of order p in complex Gaussian white noise (as nonstationary signals), and (b) the impulsive autoregressive processes with alpha-stable distributions (as non-Gaussian signals). Initially, an overview is given for linear prediction and adaptive filtering. The convergence and tracking properties of the stochastic gradient algorithms are discussed for stationary and nonstationary input signals. It is explained that the stochastic gradient lattice algorithm has many advantages over the least-mean square algorithm. Some of these advantages are having a modular structure, easy-guaranteed stability, less sensitivity to the eigenvalue spread of the input autocorrelation matrix, and easy quantization of filter coefficients (normally called reflection coefficients). We then characterize the performance of the stochastic gradient lattice algorithm for the frequency modulated signals through the optimal and adaptive lattice reflection coefficients. This is a difficult task due to the nonlinear dependence of the adaptive reflection coefficients on the preceding stages and the input signal. To ease the derivations, we assume that reflection coefficients of each stage are independent of the inputs to that stage. Then the optimal lattice filter is derived for the frequency modulated signals. This is performed by computing the optimal values of residual errors, reflection coefficients, and recovery errors. Next, we show the tracking behaviour of adaptive reflection coefficients for frequency modulated signals. This is carried out by computing the tracking model of these coefficients for the stochastic gradient lattice algorithm in average. The second-order convergence of the adaptive coefficients is investigated by modeling the theoretical asymptotic variance of the gradient noise at each stage. The accuracy of the analytical results is verified by computer simulations. Using the previous analytical results, we show a new property, the polynomial order reducing property of adaptive lattice filters. This property may be used to reduce the order of the polynomial phase of input frequency modulated signals. Considering two examples, we show how this property may be used in processing frequency modulated signals. In the first example, a detection procedure in carried out on a frequency modulated signal with a second-order polynomial phase in complex Gaussian white noise. We showed that using this technique a better probability of detection is obtained for the reduced-order phase signals compared to that of the traditional energy detector. Also, it is empirically shown that the distribution of the gradient noise in the first adaptive reflection coefficients approximates the Gaussian law. In the second example, the instantaneous frequency of the same observed signal is estimated. We show that by using this technique a lower mean square error is achieved for the estimated frequencies at high signal-to-noise ratios in comparison to that of the adaptive line enhancer. The performance of adaptive lattice filters is then investigated for the second type of input signals, i.e., impulsive autoregressive processes with alpha-stable distributions . The concept of alpha-stable distributions is first introduced. We discuss that the stochastic gradient algorithm which performs desirable results for finite variance input signals (like frequency modulated signals in noise) does not perform a fast convergence for infinite variance stable processes (due to using the minimum mean-square error criterion). To deal with such problems, the concept of minimum dispersion criterion, fractional lower order moments, and recently-developed algorithms for stable processes are introduced. We then study the possibility of using the lattice structure for impulsive stable processes. Accordingly, two new algorithms including the least-mean P-norm lattice algorithm and its normalized version are proposed for lattice filters based on the fractional lower order moments. Simulation results show that using the proposed algorithms, faster convergence speeds are achieved for parameters estimation of autoregressive stable processes with low to moderate degrees of impulsiveness in comparison to many other algorithms. Also, we discuss the effect of impulsiveness of stable processes on generating some misalignment between the estimated parameters and the true values. Due to the infinite variance of stable processes, the performance of the proposed algorithms is only investigated using extensive computer simulations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of this research is to design an efficient compression al~ gorithm for fingerprint images. The wavelet transform technique is the principal tool used to reduce interpixel redundancies and to obtain a parsimonious representation for these images. A specific fixed decomposition structure is designed to be used by the wavelet packet in order to save on the computation, transmission, and storage costs. This decomposition structure is based on analysis of information packing performance of several decompositions, two-dimensional power spectral density, effect of each frequency band on the reconstructed image, and the human visual sensitivities. This fixed structure is found to provide the "most" suitable representation for fingerprints, according to the chosen criteria. Different compression techniques are used for different subbands, based on their observed statistics. The decision is based on the effect of each subband on the reconstructed image according to the mean square criteria as well as the sensitivities in human vision. To design an efficient quantization algorithm, a precise model for distribution of the wavelet coefficients is developed. The model is based on the generalized Gaussian distribution. A least squares algorithm on a nonlinear function of the distribution model shape parameter is formulated to estimate the model parameters. A noise shaping bit allocation procedure is then used to assign the bit rate among subbands. To obtain high compression ratios, vector quantization is used. In this work, the lattice vector quantization (LVQ) is chosen because of its superior performance over other types of vector quantizers. The structure of a lattice quantizer is determined by its parameters known as truncation level and scaling factor. In lattice-based compression algorithms reported in the literature the lattice structure is commonly predetermined leading to a nonoptimized quantization approach. In this research, a new technique for determining the lattice parameters is proposed. In the lattice structure design, no assumption about the lattice parameters is made and no training and multi-quantizing is required. The design is based on minimizing the quantization distortion by adapting to the statistical characteristics of the source in each subimage. 11 Abstract Abstract Since LVQ is a multidimensional generalization of uniform quantizers, it produces minimum distortion for inputs with uniform distributions. In order to take advantage of the properties of LVQ and its fast implementation, while considering the i.i.d. nonuniform distribution of wavelet coefficients, the piecewise-uniform pyramid LVQ algorithm is proposed. The proposed algorithm quantizes almost all of source vectors without the need to project these on the lattice outermost shell, while it properly maintains a small codebook size. It also resolves the wedge region problem commonly encountered with sharply distributed random sources. These represent some of the drawbacks of the algorithm proposed by Barlaud [26). The proposed algorithm handles all types of lattices, not only the cubic lattices, as opposed to the algorithms developed by Fischer [29) and Jeong [42). Furthermore, no training and multiquantizing (to determine lattice parameters) is required, as opposed to Powell's algorithm [78). For coefficients with high-frequency content, the positive-negative mean algorithm is proposed to improve the resolution of reconstructed images. For coefficients with low-frequency content, a lossless predictive compression scheme is used to preserve the quality of reconstructed images. A method to reduce bit requirements of necessary side information is also introduced. Lossless entropy coding techniques are subsequently used to remove coding redundancy. The algorithms result in high quality reconstructed images with better compression ratios than other available algorithms. To evaluate the proposed algorithms their objective and subjective performance comparisons with other available techniques are presented. The quality of the reconstructed images is important for a reliable identification. Enhancement and feature extraction on the reconstructed images are also investigated in this research. A structural-based feature extraction algorithm is proposed in which the unique properties of fingerprint textures are used to enhance the images and improve the fidelity of their characteristic features. The ridges are extracted from enhanced grey-level foreground areas based on the local ridge dominant directions. The proposed ridge extraction algorithm, properly preserves the natural shape of grey-level ridges as well as precise locations of the features, as opposed to the ridge extraction algorithm in [81). Furthermore, it is fast and operates only on foreground regions, as opposed to the adaptive floating average thresholding process in [68). Spurious features are subsequently eliminated using the proposed post-processing scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.