43 resultados para Spectrograms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work described in this technical report is part of an ongoing project to build practical tools for the manipulation, analysis and visualisation of recordings of the natural environment. This report describes the methods we use to remove background noise from spectrograms. It updates techniques previously described in Towsey and Planitz (2011), Technical report: acoustic analysis of the natural environment, downloadable from: http://eprints.qut.edu.au/41131/. It also describes noise removal from wave-forms, a technique not described in the above 2011 technical report.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic recordings of the environment are an important aid to ecologists monitoring biodiversity and environmental health. However, rapid advances in recording technology, storage and computing make it possible to accumulate thousands of hours of recordings, of which, ecologists can only listen to a small fraction. The big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from hours, days, months to years. The visualization should facilitate navigation and yield ecologically meaningful information. Our approach is to extract (at one minute resolution) acoustic indices which reflect content of ecological interest. An acoustic index is a statistic that summarizes some aspect of the distribution of acoustic energy in a recording. We combine indices to produce false-colour images that reveal acoustic content and facilitate navigation through recordings that are months or even years in duration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Narrowband spectrograms of voiced speech can be modeled as an outcome of two-dimensional (2-D) modulation process. In this paper, we develop a demodulation algorithm to estimate the 2-D amplitude modulation (AM) and carrier of a given spectrogram patch. The demodulation algorithm is based on the Riesz transform, which is a unitary, shift-invariant operator and is obtained as a 2-D extension of the well known 1-D Hilbert transform operator. Existing methods for spectrogram demodulation rely on extension of sinusoidal demodulation method from the communications literature and require precise estimate of the 2-D carrier. On the other hand, the proposed method based on Riesz transform does not require a carrier estimate. The proposed method and the sinusoidal demodulation scheme are tested on real speech data. Experimental results show that the demodulated AM and carrier from Riesz demodulation represent the spectrogram patch more accurately compared with those obtained using the sinusoidal demodulation. The signal-to-reconstruction error ratio was found to be about 2 to 6 dB higher in case of the proposed demodulation approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a two-dimensional (2-D) multicomponent amplitude-modulation, frequency-modulation (AM-FM) model for a spectrogram patch corresponding to voiced speech, and develop a new demodulation algorithm to effectively separate the AM, which is related to the vocal tract response, and the carrier, which is related to the excitation. The demodulation algorithm is based on the Riesz transform and is developed along the lines of Hilbert-transform-based demodulation for 1-D AM-FM signals. We compare the performance of the Riesz transform technique with that of the sinusoidal demodulation technique on real speech data. Experimental results show that the Riesz-transform-based demodulation technique represents spectrogram patches accurately. The spectrograms reconstructed from the demodulated AM and carrier are inverted and the corresponding speech signal is synthesized. The signal-to-noise ratio (SNR) of the reconstructed speech signal, with respect to clean speech, was found to be 2 to 4 dB higher in case of the Riesz transform technique than the sinusoidal demodulation technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This technical report is concerned with one aspect of environmental monitoring—the detection and analysis of acoustic events in sound recordings of the environment. Sound recordings offer ecologists the advantage of cheaper and increased sampling but make available so much data that automated analysis becomes essential. The report describes a number of tools for automated analysis of recordings, including noise removal from spectrograms, acoustic event detection, event pattern recognition, spectral peak tracking, syntactic pattern recognition applied to call syllables, and oscillation detection. These algorithms are applied to a number of animal call recognition tasks, chosen because they illustrate quite different modes of analysis: (1) the detection of diffuse events caused by wind and rain, which are frequent contaminants of recordings of the terrestrial environment; (2) the detection of bird and calls; and (3) the preparation of acoustic maps for whole ecosystem analysis. This last task utilises the temporal distribution of events over a daily, monthly or yearly cycle.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The work described in this technical report is part of an ongoing project at QUT to build practical tools for the manipulation, analysis and visualisation of recordings of the natural environment. This report describes the algorithm we use to cluster the spectra in a spectrogram. The report begins with a brief description of the signal processing that prepares the spectrograms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This dissertation seeks to define and classify potential forms of Nonlinear structure and explore the possibilities they afford for the creation of new musical works. It provides the first comprehensive framework for the discussion of Nonlinear structure in musical works and provides a detailed overview of the rise of nonlinearity in music during the 20th century. Nonlinear events are shown to emerge through significant parametrical discontinuity at the boundaries between regions of relatively strong internal cohesion. The dissertation situates Nonlinear structures in relation to linear structures and unstructured sonic phenomena and provides a means of evaluating Nonlinearity in a musical structure through the consideration of the degree to which the structure is integrated, contingent, compressible and determinate as a whole. It is proposed that Nonlinearity can be classified as a three dimensional space described by three continuums: the temporal continuum, encompassing sequential and multilinear forms of organization, the narrative continuum encompassing processual, game structure and developmental narrative forms and the referential continuum encompassing stylistic allusion, adaptation and quotation. The use of spectrograms of recorded musical works is proposed as a means of evaluating Nonlinearity in a musical work through the visual representation of parametrical divergence in pitch, duration, timbre and dynamic over time. Spectral and structural analysis of repertoire works is undertaken as part of an exploration of musical nonlinearity and the compositional and performative features that characterize it. The contribution of cultural, ideological, scientific and technological shifts to the emergence of Nonlinearity in music is discussed and a range of compositional factors that contributed to the emergence of musical Nonlinearity is examined. The evolution of notational innovations from the mobile score to the screen score is plotted and a novel framework for the discussion of these forms of musical transmission is proposed. A computer coordinated performative model is discussed, in which a computer synchronises screening of notational information, provides temporal coordination of the performers through click-tracks or similar methods and synchronises the audio processing and synthesized elements of the work. It is proposed that such a model constitutes a highly effective means of realizing complex Nonlinear structures. A creative folio comprising 29 original works that explore nonlinearity is presented, discussed and categorised utilising the proposed classifications. Spectrograms of these works are employed where appropriate to illustrate the instantiation of parametrically divergent substructures and examples of structural openness through multiple versioning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Interpreting acoustic recordings of the natural environment is an increasingly important technique for ecologists wishing to monitor terrestrial ecosystems. Technological advances make it possible to accumulate many more recordings than can be listened to or interpreted, thereby necessitating automated assistance to identify elements in the soundscape. In this paper we examine the problem of estimating avian species richness by sampling from very long acoustic recordings. We work with data recorded under natural conditions and with all the attendant problems of undefined and unconstrained acoustic content (such as wind, rain, traffic, etc.) which can mask content of interest (in our case, bird calls). We describe 14 acoustic indices calculated at one minute resolution for the duration of a 24 hour recording. An acoustic index is a statistic that summarizes some aspect of the structure and distribution of acoustic energy and information in a recording. Some of the indices we calculate are standard (e.g. signal-to-noise ratio), some have been reported useful for the detection of bioacoustic activity (e.g. temporal and spectral entropies) and some are directed to avian sources (spectral persistence of whistles). We rank the one minute segments of a 24 hour recording in descending order according to an "acoustic richness" score which is derived from a single index or a weighted combination of two or more. We describe combinations of indices which lead to more efficient estimates of species richness than random sampling from the same recording, where efficiency is defined as total species identified for given listening effort. Using random sampling, we achieve a 53% increase in species recognized over traditional field surveys and an increase of 87% using combinations of indices to direct the sampling. We also demonstrate how combinations of the same indices can be used to detect long duration acoustic events (such as heavy rain and cicada chorus) and to construct long duration (24 h) spectrograms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Echolocation calls of 119 bats belonging to 12 species in three families from Antillean islands of Puerto Rico, Dominica, and St. Vincent were recorded by using time-expansion methods. Spectrograms of calls and descriptive statistics of five temporal and frequency variables measured from calls are presented. The echolocation calls of many of these species, particularly those in the family Phyllostomidae, have not been described previously. The wing morphology of each taxon is described and related to the structure of its echolocation calls and its foraging ecology. Of slow aerial-hawking insectivores, the Mormoopidae and Natalidae Mormoops blainvillii, Pteronotus davyi davyi, P. quadridens fuliginosus, and Natalus stramineus stramineus can forage with great manoeuvrability in background-cluttered space (close to vegetation), and are able to hover. Pteronotus parnellii portoricensis is able to fly and echolocate in highly-cluttered space (dense vegetation). Among frugivores, nectarivores and omnivores in the family Phyllostomidae, Brachyphylla cavernarum intermedia is adapted to foraging in the edges of vegetation in background-cluttered space, while Erophylla bombifrons bombifrons, Glossophaga longirostris rostrata, Artibeus jamaicensis jamaicensis, A. jamaicensis schwartzi and Stenoderma rufum darioi are adapted to foraging under canopies in highly-cluttered space and do not have speed or efficiency in commuting flight. In contrast, Monophyllus plethodon luciae, Sturnira lilium angeli and S. lilium paulsoni are adapted to fly in highly-cluttered space, but can also fly fast and efficiently in open areas.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Environmental sensors collect massive amounts of audio data. This thesis investigates computational methods to support human analysts in identifying faunal vocalisations from that audio. A series of experiments was conducted to trial the effectiveness of novel user interfaces. This research examines the rapid scanning of spectrograms, decision support tools for users, and cleaning methods for folksonomies. Together, these investigations demonstrate that providing computational support to human analysts increases their efficiency and accuracy; this allows bioacoustics projects to efficiently utilise their valuable human analysts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bioacoustic monitoring has become a significant research topic for species diversity conservation. Due to the development of sensing techniques, acoustic sensors are widely deployed in the field to record animal sounds over a large spatial and temporal scale. With large volumes of collected audio data, it is essential to develop semi-automatic or automatic techniques to analyse the data. This can help ecologists make decisions on how to protect and promote the species diversity. This paper presents generic features to characterize a range of bird species for vocalisation retrieval. In the implementation, audio recordings are first converted to spectrograms using short-time Fourier transform, then a ridge detection method is applied to the spectrogram for detecting points of interest. Based on the detected points, a new region representation are explored for describing various bird vocalisations and a local descriptor including temporal entropy, frequency bin entropy and histogram of counts of four ridge directions is calculated for each sub-region. To speed up the retrieval process, indexing is carried out and the retrieved results are ranked according to similarity scores. The experiment results show that our proposed feature set can achieve 0.71 in term of retrieval success rate which outperforms spectral ridge features alone (0.55) and Mel frequency cepstral coefficients (0.36).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Frogs have received increasing attention due to their effectiveness for indicating the environment change. Therefore, it is important to monitor and assess frogs. With the development of sensor techniques, large volumes of audio data (including frog calls) have been collected and need to be analysed. After transforming the audio data into its spectrogram representation using short-time Fourier transform, the visual inspection of this representation motivates us to use image processing techniques for analysing audio data. Applying acoustic event detection (AED) method to spectrograms, acoustic events are firstly detected from which ridges are extracted. Three feature sets, Mel-frequency cepstral coefficients (MFCCs), AED feature set and ridge feature set, are then used for frog call classification with a support vector machine classifier. Fifteen frog species widely spread in Queensland, Australia, are selected to evaluate the proposed method. The experimental results show that ridge feature set can achieve an average classification accuracy of 74.73% which outperforms the MFCCs (38.99%) and AED feature set (67.78%).