15 resultados para Vocal music

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze the AlApana of a Carnatic music piece without the prior knowledge of the singer or the rAga. AlApana is ameans to communicate to the audience, the flavor or the bhAva of the rAga through the permitted notes and its phrases. The input to our analysis is a recording of the vocal AlApana along with the accompanying instrument. The AdhAra shadja(base note) of the singer for that AlApana is estimated through a stochastic model of note frequencies. Based on the shadja, we identify the notes (swaras) used in the AlApana using a semi-continuous GMM. Using the probabilities of each note interval, we recognize swaras of the AlApana. For sampurNa rAgas, we can identify the possible rAga, based on the swaras. We have been able to achieve correct shadja identification, which is crucial to all further steps, in 88.8% of 55 AlApanas. Among them (48 AlApanas of 7 rAgas), we get 91.5% correct swara identification and 62.13% correct R (rAga) accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of automatic melody line identification in a MIDI file plays an important role towards taking QBH systems to the next level. We present here, a novel algorithm to identify the melody line in a polyphonic MIDI file. A note pruning and track/channel ranking method is used to identify the melody line. We use results from musicology to derive certain simple heuristics for the note pruning stage. This helps in the robustness of the algorithm, by way of discarding "spurious" notes. A ranking based on the melodic information in each track/channel enables us to choose the melody line accurately. Our algorithm makes no assumption about MIDI performer specific parameters, is simple and achieves an accuracy of 97% in identifying the melody line correctly. This algorithm is currently being used by us in a QBH system built in our lab.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents comparative data on the vocal communication of two Asian leaf monkeys, the Nilgiri langur (Presbytis johnii) and South Indian common langur (Presbytis entellus), based on sound recordings and behavioural observations of free-ranging groups. Spectrographical analyses revealed a repertoire of 18 basic patterns for Nilgiri langurs, and 21 basic patterns for common langurs. The repertoires of the two langur species consist of both discretely structured vocal patterns, in which alterations of the physical parameters are restricted to intra-class variation, and those in which structural variations cause intergradation between different sections of the repertoire. Qualitative assessments of group scans indicate that in both species vocal behaviour is characterized by pronounced sex-differences in the use of the different elements of the vocal repertoire. Comparison of data available from different populations of P. entellus suggests population-specific modifications on both structural and behavioural levels. Moreover, characteristic elements of the vocal systems of the two Asian species demonstrate striking similarities to those described for the African black-and-white colobus.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Field observations and spectrographic analyses of sound recordings of South Indian bonnet macaques revealed a vocal repertoire of at least 25 basic patterns. The repertoire consists of well separated sound classes and acoustic categories connected by structural intergradation. Besides structural variations within and between different elements of the repertoire, the vocal system ofMacaca radiata is characterized by regular combinations of particular basic patterns. These combinations occurred not only between calls of similar structure and function but also between calls usually emitted in entirely different social contexts. According to the qualitative analysis, sex-specific asymmetries of the vocal behaviour were less pronounced than age-dependent characteristics. The comparison of clear call vocalizations ofMacaca radiata andM. fuscata revealed significant species-specific differences on the structural and the behavioural level. Evaluations of the structural features of alarm calls of various macaque species imply marked differences between members of thefascicularis group andsinica group on one hand and thesilenus group andarctoides

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sound recordings and behavioural data were collected from four primate species of two genera (Macaca, Presbytis). Comparative analyses of structural and behavioural aspects of vocal communication revealed a high degree of intrageneric similarity but striking intergeneric differences. In the two macaque species (Macaca silenus, Macaca radiata), males and females shared the major part of the repertoire. In contrast, in the two langurs (Presbytis johnii, Presbytis entellus), many calls were exclusive to adult males. Striking differences between both species groups occurred with respect to age-specific patterns of vocal behaviour. The diversity of vocal behaviour was assessed from the number of different calls used and the proportion of each call in relation to total vocal output for a given age/sex class. In Macaca, diversity decreases with the age of the vocalizer, whereas in Presbytis the age of the vocalizer and the diversity of vocal behaviour are positively correlated. A comparison of the data of the two genera does not suggest any causal relationship between group composition (e.g. multi-male vs. one-male group) and communication system. Within each genus, interspecific differences in vocal behaviour can be explained by differences in social behaviour (e.g. group cohesion, intergroup relation, mating behaviour) and functional disparities. Possible factors responsible for the pronounced intergeneric differences in vocal behaviour between Macaca and Presbytis are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the direction of arrival (DOA) estimation problem, we encounter both finite data and insufficient knowledge of array characterization. It is therefore important to study how subspace-based methods perform in such conditions. We analyze the finite data performance of the multiple signal classification (MUSIC) and minimum norm (min. norm) methods in the presence of sensor gain and phase errors, and derive expressions for the mean square error (MSE) in the DOA estimates. These expressions are first derived assuming an arbitrary array and then simplified for the special case of an uniform linear array with isotropic sensors. When they are further simplified for the case of finite data only and sensor errors only, they reduce to the recent results given in [9-12]. Computer simulations are used to verify the closeness between the predicted and simulated values of the MSE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressive Sensing (CS) is a new sensing paradigm which permits sampling of a signal at its intrinsic information rate which could be much lower than Nyquist rate, while guaranteeing good quality reconstruction for signals sparse in a linear transform domain. We explore the application of CS formulation to music signals. Since music signals comprise of both tonal and transient nature, we examine several transforms such as discrete cosine transform (DCT), discrete wavelet transform (DWT), Fourier basis and also non-orthogonal warped transforms to explore the effectiveness of CS theory and the reconstruction algorithms. We show that for a given sparsity level, DCT, overcomplete, and warped Fourier dictionaries result in better reconstruction, and warped Fourier dictionary gives perceptually better reconstruction. “MUSHRA” test results show that a moderate quality reconstruction is possible with about half the Nyquist sampling.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Music signals comprise of atomic notes drawn from a musical scale. The creation of musical sequences often involves splicing the notes in a constrained way resulting in aesthetically appealing patterns. We develop an approach for music signal representation based on symbolic dynamics by translating the lexicographic rules over a musical scale to constraints on a Markov chain. This source representation is useful for machine based music synthesis, in a way, similar to a musician producing original music. In order to mathematically quantify user listening experience, we study the correlation between the max-entropic rate of a musical scale and the subjective aesthetic component. We present our analysis with examples from the south Indian classical music system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective identification and description of mimicked calls is a primary component of any study on avian vocal mimicry but few studies have adopted a quantitative approach. We used spectral feature representations commonly used in human speech analysis in combination with various distance metrics to distinguish between mimicked and non-mimicked calls of the greater racket-tailed drongo, Dicrurus paradiseus and cross-validated the results with human assessment of spectral similarity. We found that the automated method and human subjects performed similarly in terms of the overall number of correct matches of mimicked calls to putative model calls. However, the two methods also misclassified different subsets of calls and we achieved a maximum accuracy of ninety five per cent only when we combined the results of both the methods. This study is the first to use Mel-frequency Cepstral Coefficients and Relative Spectral Amplitude - filtered Linear Predictive Coding coefficients to quantify vocal mimicry. Our findings also suggest that in spite of several advances in automated methods of song analysis, corresponding cross-validation by humans remains essential.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rg(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rg recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We formulate the problem of detecting the constituent instruments in a polyphonic music piece as a joint decoding problem. From monophonic data, parametric Gaussian Mixture Hidden Markov Models (GM-HMM) are obtained for each instrument. We propose a method to use the above models in a factorial framework, termed as Factorial GM-HMM (F-GM-HMM). The states are jointly inferred to explain the evolution of each instrument in the mixture observation sequence. The dependencies are decoupled using variational inference technique. We show that the joint time evolution of all instruments' states can be captured using F-GM-HMM. We compare performance of proposed method with that of Student's-t mixture model (tMM) and GM-HMM in an existing latent variable framework. Experiments on two to five polyphony with 8 instrument models trained on the RWC dataset, tested on RWC and TRIOS datasets show that F-GM-HMM gives an advantage over the other considered models in segments containing co-occurring instruments.