882 resultados para automatic music analysis
Resumo:
The effect of using a spatially smoothed forward-backward covariance matrix on the performance of weighted eigen-based state space methods/ESPRIT, and weighted MUSIC for direction-of-arrival (DOA) estimation is analyzed. Expressions for the mean-squared error in the estimates of the signal zeros and the DOA estimates, along with some general properties of the estimates and optimal weighting matrices, are derived. A key result is that optimally weighted MUSIC and weighted state-space methods/ESPRIT have identical asymptotic performance. Moreover, by properly choosing the number of subarrays, the performance of unweighted state space methods can be significantly improved. It is also shown that the mean-squared error in the DOA estimates is independent of the exact distribution of the source amplitudes. This results in a unified framework for dealing with DOA estimation using a uniformly spaced linear sensor array and the time series frequency estimation problems.
Resumo:
The efficacy of the multifractal spectrum as a tool for characterizing images has been studied. This spectrum has been computed for digitized images of the nucleus of human cervical cancer cells and it was observed that the entire spectrum is almost fully reproduced for a normal cell while only the right half (q<0) of the spectrum is reproduced for a cancerous cell. Cells in stages in between the two extremes show a shortening of the left half of the spectrum proportional to their condition. The extent of this shortening has been found to be sufficient to permit a classification between three classes of cells at varying distances from a basal cancerous layer-the superficial cells, the intermediate cells and the parabasal cells. This technique may be used for automatic screening of the population while also indicating the stage of malignancy
Resumo:
The statistical performance analysis of ESPRIT, root-MUSIC, minimum-norm methods for direction estimation, due to finite data perturbations, using the modified spatially smoothed covariance matrix, is developed. Expressions for the mean-squared error in the direction estimates are derived based on a common framework. Based on the analysis, the use of the modified smoothed covariance matrix improves the performance of the methods when the sources are fully correlated. Also, the performance is better even when the number of subarrays is large unlike in the case of the conventionally smoothed covariance matrix. However, the performance for uncorrelated sources deteriorates due to an artificial correlation introduced by the modified smoothing. The theoretical expressions are validated using extensive simulations. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
This paper makes an attempt to assess the benefits of replacing a conventional generator excitation system (AVR + PSS) with a nonlinear voltage regulator using the concepts of synchronizing and damping torque components in a single machine infinite bus (SMIB) system. In recent years, there has been considerable interest in designing nonlinear excitation controllers, which are expected to give better dynamic performance over a wider range of system and operating conditions. The performance of these controllers is often justified by simulation studies on few test cases which may not adequately represent the diverse operating conditions of a typical power system. The performance of two such nonlinear controllers which are designed based on feedback linearization and include automatic voltage regulation with good dynamic performance have been analyzed using an SMIB model. Linearizing the nonlinear control laws along with the SMIB system equations, a Heffron Phillip's type of a model has been derived. Concepts of synchronizing and damping torque components have been used to show that such controllers can impair the small signal stability under certain operating conditions. This paper shows the possibility of negative damping contribution due to nonlinear voltage regulators and gives a new insight on understanding the physical impact of complex nonlinear control laws on power system dynamics.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Wetlands are the most productive and biologically diverse but very fragile ecosystems. They are vulnerable to even small changes in their biotic and abiotic factors. In recent years, there has been concern over the continuous degradation of wetlands due to unplanned developmental activities. This necessitates inventorying, mapping, and monitoring of wetlands to implement sustainable management approaches. The principal objective of this work is to evolve a strategy to identify and monitor wetlands using temporal remote sensing (RS) data. Pattern classifiers were used to extract wetlands automatically from NIR bands of MODIS, Landsat MSS and Landsat TM remote sensing data. MODIS provided data for 2002 to 2007, while for 1973 and 1992 IR Bands of Landsat MSS and TM (79m and 30m spatial resolution) data were used. Principal components of IR bands of MODIS (250 m) were fused with IRS LISS-3 NIR (23.5 m). To extract wetlands, statistical unsupervised learning of IR bands for the respective temporal data was performed using Bayesian approach based on prior probability, mean and covariance. Temporal analysis of wetlands indicates a sharp decline of 58% in Greater Bangalore attributing to intense urbanization processes, evident from a 466% increase in built-up area from 1973 to 2007.
Resumo:
We propose an iterative algorithm to detect transient segments in audio signals. Short time Fourier transform(STFT) is used to detect rapid local changes in the audio signal. The algorithm has two steps that iteratively - (a) calculate a function of the STFT and (b) build a transient signal. A dynamic thresholding scheme is used to locate the potential positions of transients in the signal. The iterative procedure ensures that genuine transients are built up while the localised spectral noise are suppressed by using an energy criterion. The extracted transient signal is later compared to a ground truth dataset. The algorithm performed well on two databases. On the EBU-SQAM database of monophonic sounds, the algorithm achieved an F-measure of 90% while on our database of polyphonic audio an F-measure of 91% was achieved. This technique is being used as a preprocessing step for a tempo analysis algorithm and a TSR (Transients + Sines + Residue) decomposition scheme.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Analysis of high resolution satellite images has been an important research topic for urban analysis. One of the important features of urban areas in urban analysis is the automatic road network extraction. Two approaches for road extraction based on Level Set and Mean Shift methods are proposed. From an original image it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. The image is preprocessed to improve the tolerance by reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are first extracted as elongated regions, nonlinear noise segments are removed using a median filter (based on the fact that road networks constitute large number of small linear structures). Then road extraction is performed using Level Set and Mean Shift method. Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m resolution IKONOS data has been used for the experiment.
Resumo:
Music signals comprise of atomic notes drawn from a musical scale. The creation of musical sequences often involves splicing the notes in a constrained way resulting in aesthetically appealing patterns. We develop an approach for music signal representation based on symbolic dynamics by translating the lexicographic rules over a musical scale to constraints on a Markov chain. This source representation is useful for machine based music synthesis, in a way, similar to a musician producing original music. In order to mathematically quantify user listening experience, we study the correlation between the max-entropic rate of a musical scale and the subjective aesthetic component. We present our analysis with examples from the south Indian classical music system.
Resumo:
Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.
Resumo:
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
Signal processing techniques play important roles in the design of digital communication systems. These include information manipulation, transmitter signal processing, channel estimation, channel equalization and receiver signal processing. By interacting with communication theory and system implementing technologies, signal processing specialists develop efficient schemes for various communication problems by wisely exploiting various mathematical tools such as analysis, probability theory, matrix theory, optimization theory, and many others. In recent years, researchers realized that multiple-input multiple-output (MIMO) channel models are applicable to a wide range of different physical communications channels. Using the elegant matrix-vector notations, many MIMO transceiver (including the precoder and equalizer) design problems can be solved by matrix and optimization theory. Furthermore, the researchers showed that the majorization theory and matrix decompositions, such as singular value decomposition (SVD), geometric mean decomposition (GMD) and generalized triangular decomposition (GTD), provide unified frameworks for solving many of the point-to-point MIMO transceiver design problems.
In this thesis, we consider the transceiver design problems for linear time invariant (LTI) flat MIMO channels, linear time-varying narrowband MIMO channels, flat MIMO broadcast channels, and doubly selective scalar channels. Additionally, the channel estimation problem is also considered. The main contributions of this dissertation are the development of new matrix decompositions, and the uses of the matrix decompositions and majorization theory toward the practical transmit-receive scheme designs for transceiver optimization problems. Elegant solutions are obtained, novel transceiver structures are developed, ingenious algorithms are proposed, and performance analyses are derived.
The first part of the thesis focuses on transceiver design with LTI flat MIMO channels. We propose a novel matrix decomposition which decomposes a complex matrix as a product of several sets of semi-unitary matrices and upper triangular matrices in an iterative manner. The complexity of the new decomposition, generalized geometric mean decomposition (GGMD), is always less than or equal to that of geometric mean decomposition (GMD). The optimal GGMD parameters which yield the minimal complexity are derived. Based on the channel state information (CSI) at both the transmitter (CSIT) and receiver (CSIR), GGMD is used to design a butterfly structured decision feedback equalizer (DFE) MIMO transceiver which achieves the minimum average mean square error (MSE) under the total transmit power constraint. A novel iterative receiving detection algorithm for the specific receiver is also proposed. For the application to cyclic prefix (CP) systems in which the SVD of the equivalent channel matrix can be easily computed, the proposed GGMD transceiver has K/log_2(K) times complexity advantage over the GMD transceiver, where K is the number of data symbols per data block and is a power of 2. The performance analysis shows that the GGMD DFE transceiver can convert a MIMO channel into a set of parallel subchannels with the same bias and signal to interference plus noise ratios (SINRs). Hence, the average bit rate error (BER) is automatically minimized without the need for bit allocation. Moreover, the proposed transceiver can achieve the channel capacity simply by applying independent scalar Gaussian codes of the same rate at subchannels.
In the second part of the thesis, we focus on MIMO transceiver design for slowly time-varying MIMO channels with zero-forcing or MMSE criterion. Even though the GGMD/GMD DFE transceivers work for slowly time-varying MIMO channels by exploiting the instantaneous CSI at both ends, their performance is by no means optimal since the temporal diversity of the time-varying channels is not exploited. Based on the GTD, we develop space-time GTD (ST-GTD) for the decomposition of linear time-varying flat MIMO channels. Under the assumption that CSIT, CSIR and channel prediction are available, by using the proposed ST-GTD, we develop space-time geometric mean decomposition (ST-GMD) DFE transceivers under the zero-forcing or MMSE criterion. Under perfect channel prediction, the new system minimizes both the average MSE at the detector in each space-time (ST) block (which consists of several coherence blocks), and the average per ST-block BER in the moderate high SNR region. Moreover, the ST-GMD DFE transceiver designed under an MMSE criterion maximizes Gaussian mutual information over the equivalent channel seen by each ST-block. In general, the newly proposed transceivers perform better than the GGMD-based systems since the super-imposed temporal precoder is able to exploit the temporal diversity of time-varying channels. For practical applications, a novel ST-GTD based system which does not require channel prediction but shares the same asymptotic BER performance with the ST-GMD DFE transceiver is also proposed.
The third part of the thesis considers two quality of service (QoS) transceiver design problems for flat MIMO broadcast channels. The first one is the power minimization problem (min-power) with a total bitrate constraint and per-stream BER constraints. The second problem is the rate maximization problem (max-rate) with a total transmit power constraint and per-stream BER constraints. Exploiting a particular class of joint triangularization (JT), we are able to jointly optimize the bit allocation and the broadcast DFE transceiver for the min-power and max-rate problems. The resulting optimal designs are called the minimum power JT broadcast DFE transceiver (MPJT) and maximum rate JT broadcast DFE transceiver (MRJT), respectively. In addition to the optimal designs, two suboptimal designs based on QR decomposition are proposed. They are realizable for arbitrary number of users.
Finally, we investigate the design of a discrete Fourier transform (DFT) modulated filterbank transceiver (DFT-FBT) with LTV scalar channels. For both cases with known LTV channels and unknown wide sense stationary uncorrelated scattering (WSSUS) statistical channels, we show how to optimize the transmitting and receiving prototypes of a DFT-FBT such that the SINR at the receiver is maximized. Also, a novel pilot-aided subspace channel estimation algorithm is proposed for the orthogonal frequency division multiplexing (OFDM) systems with quasi-stationary multi-path Rayleigh fading channels. Using the concept of a difference co-array, the new technique can construct M^2 co-pilots from M physical pilot tones with alternating pilot placement. Subspace methods, such as MUSIC and ESPRIT, can be used to estimate the multipath delays and the number of identifiable paths is up to O(M^2), theoretically. With the delay information, a MMSE estimator for frequency response is derived. It is shown through simulations that the proposed method outperforms the conventional subspace channel estimator when the number of multipaths is greater than or equal to the number of physical pilots minus one.