45 resultados para Spectrogram
Resumo:
Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.
Resumo:
A new experimental technique is proposed to determine refractive indices of liquids and isotropic solids at different wavelengths. A Pellin-Broca hollow prism filled with a liquid sample produces the spectrum (of the liquid prism) on the photographic plate of the camera. A plane reflector, mounted at a small angle to the normal of the exit face of the prism, also forms a direct image of the collimator slit in the plane of the same photographic plate. All the necessary information for determining the refractive indices (for different wavelengths) is extracted directly from the spectrogram without using any goniometric system. Experiments are conducted with the liquid prisms of isopropyl alcohol, water, and benzene. The results of the experiments are compared with those obtained by a Pulfrich refractometer (critical angle method). The proposed new technique gives the refractive indices for visible and invisible spectral lines to an accuracy of 2x10(-5). (C) 1997 Society of Photo-Optical Instrumentation Engineers.
Resumo:
We address the problem of estimating instantaneous frequency (IF) of a real-valued constant amplitude time-varying sinusoid. Estimation of polynomial IF is formulated using the zero-crossings of the signal. We propose an algorithm to estimate nonpolynomial IF by local approximation using a low-order polynomial, over a short segment of the signal. This involves the choice of window length to minimize the mean square error (MSE). The optimal window length found by directly minimizing the MSE is a function of the higher-order derivatives of the IF which are not available a priori. However, an optimum solution is formulated using an adaptive window technique based on the concept of intersection of confidence intervals. The adaptive algorithm enables minimum MSE-IF (MMSE-IF) estimation without requiring a priori information about the IF. Simulation results show that the adaptive window zero-crossing-based IF estimation method is superior to fixed window methods and is also better than adaptive spectrogram and adaptive Wigner-Ville distribution (WVD)-based IF estimators for different signal-to-noise ratio (SNR).
Resumo:
In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.
Resumo:
Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.
Resumo:
Narrowband spectrograms of voiced speech can be modeled as an outcome of two-dimensional (2-D) modulation process. In this paper, we develop a demodulation algorithm to estimate the 2-D amplitude modulation (AM) and carrier of a given spectrogram patch. The demodulation algorithm is based on the Riesz transform, which is a unitary, shift-invariant operator and is obtained as a 2-D extension of the well known 1-D Hilbert transform operator. Existing methods for spectrogram demodulation rely on extension of sinusoidal demodulation method from the communications literature and require precise estimate of the 2-D carrier. On the other hand, the proposed method based on Riesz transform does not require a carrier estimate. The proposed method and the sinusoidal demodulation scheme are tested on real speech data. Experimental results show that the demodulated AM and carrier from Riesz demodulation represent the spectrogram patch more accurately compared with those obtained using the sinusoidal demodulation. The signal-to-reconstruction error ratio was found to be about 2 to 6 dB higher in case of the proposed demodulation approach.
Resumo:
Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error <= 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
Resumo:
We propose a two-dimensional (2-D) multicomponent amplitude-modulation, frequency-modulation (AM-FM) model for a spectrogram patch corresponding to voiced speech, and develop a new demodulation algorithm to effectively separate the AM, which is related to the vocal tract response, and the carrier, which is related to the excitation. The demodulation algorithm is based on the Riesz transform and is developed along the lines of Hilbert-transform-based demodulation for 1-D AM-FM signals. We compare the performance of the Riesz transform technique with that of the sinusoidal demodulation technique on real speech data. Experimental results show that the Riesz-transform-based demodulation technique represents spectrogram patches accurately. The spectrograms reconstructed from the demodulated AM and carrier are inverted and the corresponding speech signal is synthesized. The signal-to-noise ratio (SNR) of the reconstructed speech signal, with respect to clean speech, was found to be 2 to 4 dB higher in case of the Riesz transform technique than the sinusoidal demodulation technique.
Resumo:
We propose a multiple initialization based spectral peak tracking (MISPT) technique for heart rate monitoring from photoplethysmography (PPG) signal. MISPT is applied on the PPG signal after removing the motion artifact using an adaptive noise cancellation filter. MISPT yields several estimates of the heart rate trajectory from the spectrogram of the denoised PPG signal which are finally combined using a novel measure called trajectory strength. Multiple initializations help in correcting erroneous heart rate trajectories unlike the typical SPT which uses only single initialization. Experiments on the PPG data from 12 subjects recorded during intensive physical exercise show that the MISPT based heart rate monitoring indeed yields a better heart rate estimate compared to the SPT with single initialization. On the 12 datasets MISPT results in an average absolute error of 1.11 BPM which is lower than 1.28 BPM obtained by the state-of-the-art online heart rate monitoring algorithm.
Resumo:
A novel test method for the characterisation of flexible forming processes is proposed and applied to four flexible forming processes: Incremental Sheet Forming (ISF), conventional spinning, the English wheel and power hammer. The proposed method is developed in analogy with time-domain control engineering, where a system is characterised by its impulse response. The spatial impulse response is used to characterise the change in workpiece deformation created by a process, but has also been applied with a strain spectrogram, as a novel way to characterise a process and the physical effect it has on the workpiece. Physical and numerical trials to study the effects of process and material parameters on spatial impulse response lead to three main conclusions. Incremental sheet forming is particularly sensitive to process parameters. The English wheel and power hammer are strongly similar and largely insensitive to both process and material parameters. Spinning develops in two stages and is sensitive to most process parameters, but insensitive to prior deformation. Finally, the proposed method could be applied to modelling, classification of existing and novel processes, product-process matching and closed-loop control of flexible forming processes. © 2012 Elsevier B.V.
Resumo:
This work addresses two related questions. The first question is what joint time-frequency energy representations are most appropriate for auditory signals, in particular, for speech signals in sonorant regions. The quadratic transforms of the signal are examined, a large class that includes, for example, the spectrograms and the Wigner distribution. Quasi-stationarity is not assumed, since this would neglect dynamic regions. A set of desired properties is proposed for the representation: (1) shift-invariance, (2) positivity, (3) superposition, (4) locality, and (5) smoothness. Several relations among these properties are proved: shift-invariance and positivity imply the transform is a superposition of spectrograms; positivity and superposition are equivalent conditions when the transform is real; positivity limits the simultaneous time and frequency resolution (locality) possible for the transform, defining an uncertainty relation for joint time-frequency energy representations; and locality and smoothness tradeoff by the 2-D generalization of the classical uncertainty relation. The transform that best meets these criteria is derived, which consists of two-dimensionally smoothed Wigner distributions with (possibly oriented) 2-D guassian kernels. These transforms are then related to time-frequency filtering, a method for estimating the time-varying 'transfer function' of the vocal tract, which is somewhat analogous to ceptstral filtering generalized to the time-varying case. Natural speech examples are provided. The second question addressed is how to obtain a rich, symbolic description of the phonetically relevant features in these time-frequency energy surfaces, the so-called schematic spectrogram. Time-frequency ridges, the 2-D analog of spectral peaks, are one feature that is proposed. If non-oriented kernels are used for the energy representation, then the ridge tops can be identified, with zero-crossings in the inner product of the gradient vector and the direction of greatest downward curvature. If oriented kernels are used, the method can be generalized to give better orientation selectivity (e.g., at intersecting ridges) at the cost of poorer time-frequency locality. Many speech examples are given showing the performance for some traditionally difficult cases: semi-vowels and glides, nasalized vowels, consonant-vowel transitions, female speech, and imperfect transmission channels.
Resumo:
This experimental study focuses on a detection system at the seismic station level that should have a similar role to the detection algorithms based on the ratio STA/LTA. We tested two types of neural network: Multi-Layer Perceptrons and Support Vector Machines, trained in supervised mode. The universe of data consisted of 2903 patterns extracted from records of the PVAQ station, of the seismography network of the Institute of Meteorology of Portugal. The spectral characteristics of the records and its variation in time were reflected in the input patterns, consisting in a set of values of power spectral density in selected frequencies, extracted from a spectro gram calculated over a segment of record of pre-determined duration. The universe of data was divided, with about 60% for the training and the remainder reserved for testing and validation. To ensure that all patterns in the universe of data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. The best results, in terms of sensitivity and selectivity in the whole data ranged between 98% and 100%. These results compare very favorably with the ones obtained by the existing detection system, 50%.
Resumo:
This study describes the on-line operation of a seismic detection system to act at the level of a seismic station providing similar role to that of a STA /LTA ratio-based detection algorithms. The intelligent detector is a Support Vector Machine (SVM), trained with data consisting of 2903 patterns extracted from records of the PVAQ station, one of the seismographic network's stations of the Institute of Meteorology of Portugal (IM). Records' spectral variations in time and characteristics were reflected in the SVM input patterns, as a set of values of power spectral density at selected frequencies. To ensure that all patterns of the sample data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. After having been trained, the proposed system was experimented in continuous operation for unseen (out of sample) data, and the SVM detector obtained 97.7% and 98.7% of sensitivity and selectivity, respectively. The same type of ANN presented 88.4 % and 99.4% of sensitivity and selectivity when applied to data of a different seismic station of IM. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
Temporal processing is examined for sounds delivered to the intact ear of individuals with unilateral hearing, and delivered to one ear of individuals with normal, bilateral hearing. Two temporal processing skills are assessed: 1) the ability to detect sinusoidal amplitude modulation of a wide-band noise, for various modulation frequencies, and 2) the just-noticeable-difference for temporal complexity of random-spectrogram-sounds.
Resumo:
We have applied time series analytical techniques to the flux of lava from an extrusive eruption. Tilt data acting as a proxy for flux are used in a case study of the May–August 1997 period of the eruption at Soufrière Hills Volcano, Montserrat. We justify the use of such a proxy by simple calibratory arguments. Three techniques of time series analysis are employed: spectral, spectrogram and wavelet methods. In addition to the well-known ~9-hour periodicity shown by these data, a previously unknown periodic flux variability is revealed by the wavelet analysis as a 3-day cycle of frequency modulation during June–July 1997, though the physical mechanism responsible is not clear. Such time series analysis has potential for other lava flux proxies at other types of volcanoes.