215 resultados para audio processing
Resumo:
Automated digital recordings are useful for large-scale temporal and spatial environmental monitoring. An important research effort has been the automated classification of calling bird species. In this paper we examine a related task, retrieval of birdcalls from a database of audio recordings, similar to a user supplied query call. Such a retrieval task can sometimes be more useful than an automated classifier. We compare three approaches to similarity-based birdcall retrieval using spectral ridge features and two kinds of gradient features, structure tensor and the histogram of oriented gradients. The retrieval accuracy of our spectral ridge method is 94% compared to 82% for the structure tensor method and 90% for the histogram of gradients method. Additionally, this approach potentially offers a more compact representation and is more computationally efficient.
Resumo:
Bioacoustic monitoring has become a significant research topic for species diversity conservation. Due to the development of sensing techniques, acoustic sensors are widely deployed in the field to record animal sounds over a large spatial and temporal scale. With large volumes of collected audio data, it is essential to develop semi-automatic or automatic techniques to analyse the data. This can help ecologists make decisions on how to protect and promote the species diversity. This paper presents generic features to characterize a range of bird species for vocalisation retrieval. In the implementation, audio recordings are first converted to spectrograms using short-time Fourier transform, then a ridge detection method is applied to the spectrogram for detecting points of interest. Based on the detected points, a new region representation are explored for describing various bird vocalisations and a local descriptor including temporal entropy, frequency bin entropy and histogram of counts of four ridge directions is calculated for each sub-region. To speed up the retrieval process, indexing is carried out and the retrieved results are ranked according to similarity scores. The experiment results show that our proposed feature set can achieve 0.71 in term of retrieval success rate which outperforms spectral ridge features alone (0.55) and Mel frequency cepstral coefficients (0.36).
Resumo:
Acoustic recordings of the environment provide an effective means to monitor bird species diversity. To facilitate exploration of acoustic recordings, we describe a content-based birdcall retrieval algorithm. A query birdcall is a region of spectrogram bounded by frequency and time. Retrieval depends on a similarity measure derived from the orientation and distribution of spectral ridges. The spectral ridge detection method caters for a broad range of birdcall structures. In this paper, we extend previous work by incorporating a spectrogram scaling step in order to improve the detection of spectral ridges. Compared to an existing approach based on MFCC features, our feature representation achieves better retrieval performance for multiple bird species in noisy recordings.
Resumo:
We have prepared p-n junction organic photovoltaic cells using an all solution processing method with poly(3-hexylthiophene) (P3HT) as the donor and phenyl-C 61-butyric acid methyl ester (PCBM) as the acceptor. Interdigitated donor/acceptor interface morphology was observed in the device processed with the lowest boiling point solvent for PCBM used in this study. The influences of different solvents on donor/acceptor morphology and respective device performance were investigated simultaneously. The best device obtained had characteristically rough interface morphology with a peak to valley value ∼15 nm. The device displayed a power conversion efficiency of 1.78%, an open circuit voltage (V oc) 0.44 V, a short circuit current density (J sc) 9.4 mA/cm 2 and a fill factor 43%.
Resumo:
Objective Self-report measures are typically used to assess the effectiveness of road safety advertisements. However, psychophysiological measures of persuasive processing (i.e., skin conductance response [SCR]) and objective driving measures of persuasive outcomes (i.e., in-vehicle GPS devices) may provide further insights into the effectiveness of these advertisements. This study aimed to explore the persuasive processing and outcomes of two anti-speeding advertisements by incorporating both self-report and objective measures of speeding behaviour. In addition, this study aimed to compare the findings derived from these different measurement approaches. Methods Young drivers (N = 20, Mage = 21.01 years) viewed either a positive or negative emotion-based anti-speeding television advertisement. Whilst viewing the advertisement, SCR activity was measured to assess ad-evoked arousal responses. The RoadScout® GPS device was then installed into participants’ vehicles for one week to measure on-road speed-related driving behaviour. Self-report measures assessed persuasive processing (emotional and arousal responses) and actual driving behaviour. Results There was general correspondence between the self-report measures of arousal and the SCR and between the self-report measure of actual driving behaviour and the objective driving data (as assessed via the GPS devices). Conclusions This study provides insights into how psychophysiological and GPS devices could be used as objective measures in conjunction with self-report measures to further understand the persuasive processes and outcomes of emotion-based anti-speeding advertisements.
Early mathematical learning: Number processing skills and executive function at 5 and 8 years of age
Resumo:
This research investigated differences and associations in performance in number processing and executive function for children attending primary school in a large Australian metropolitan city. In a cross-sectional study, performance of 25 children in the first full-time year of school, (Prep; mean age = 5.5 years) and 21 children in Year 3 (mean age = 8.5 years) completed three number processing tasks and three executive function tasks. Year 3 children consistently outperformed the Prep year children on measures of accuracy and reaction time, on the tasks of number comparison, calculation, shifting, and inhibition but not on number line estimation. The components of executive function (shifting, inhibition, and working memory) showed different patterns of correlation to performance on number processing tasks across the early years of school. Findings could be used to enhance teachers’ understanding about the role of the cognitive processes employed by children in numeracy learning, and so inform teachers’ classroom practices.
Resumo:
Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).
Resumo:
The Rapid Visual Information Processing (RVIP) task, a serial discrimination task where task performance believed to reflect sustained attention capabilities, is widely used in behavioural research and increasingly in neuroimaging studies. To date, functional neuroimaging research into the RVIP has been undertaken using block analyses, reflecting the sustained processing involved in the task, but not necessarily the transient processes associated with individual trial performance. Furthermore, this research has been limited to young cohorts. This study assessed the behavioural and functional magnetic resonance imaging (fMRI) outcomes of the RVIP task using both block and event-related analyses in a healthy middle aged cohort (mean age = 53.56 years, n = 16). The results show that the version of the RVIP used here is sensitive to changes in attentional demand processes with participants achieving a 43% accuracy hit rate in the experimental task compared with 96% accuracy in the control task. As shown by previous research, the block analysis revealed an increase in activation in a network of frontal, parietal, occipital and cerebellar regions. The event related analysis showed a similar network of activation, seemingly omitting regions involved in the processing of the task (as shown in the block analysis), such as occipital areas and the thalamus, providing an indication of a network of regions involved in correct trial performance. Frontal (superior and inferior frontal gryi), parietal (precuenus, inferior parietal lobe) and cerebellar regions were shown to be active in both the block and event-related analyses, suggesting their importance in sustained attention/vigilance. These networks and the differences between them are discussed in detail, as well as implications for future research in middle aged cohorts.
Resumo:
Bioacoustic data can be used for monitoring animal species diversity. The deployment of acoustic sensors enables acoustic monitoring at large temporal and spatial scales. We describe a content-based birdcall retrieval algorithm for the exploration of large data bases of acoustic recordings. In the algorithm, an event-based searching scheme and compact features are developed. In detail, ridge events are detected from audio files using event detection on spectral ridges. Then event alignment is used to search through audio files to locate candidate instances. A similarity measure is then applied to dimension-reduced spectral ridge feature vectors. The event-based searching method processes a smaller list of instances for faster retrieval. The experimental results demonstrate that our features achieve better success rate than existing methods and the feature dimension is greatly reduced.
Resumo:
This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed.
Resumo:
Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.
Resumo:
Environmental changes have put great pressure on biological systems leading to the rapid decline of biodiversity. To monitor this change and protect biodiversity, animal vocalizations have been widely explored by the aid of deploying acoustic sensors in the field. Consequently, large volumes of acoustic data are collected. However, traditional manual methods that require ecologists to physically visit sites to collect biodiversity data are both costly and time consuming. Therefore it is essential to develop new semi-automated and automated methods to identify species in automated audio recordings. In this study, a novel feature extraction method based on wavelet packet decomposition is proposed for frog call classification. After syllable segmentation, the advertisement call of each frog syllable is represented by a spectral peak track, from which track duration, dominant frequency and oscillation rate are calculated. Then, a k-means clustering algorithm is applied to the dominant frequency, and the centroids of clustering results are used to generate the frequency scale for wavelet packet decomposition (WPD). Next, a new feature set named adaptive frequency scaled wavelet packet decomposition sub-band cepstral coefficients is extracted by performing WPD on the windowed frog calls. Furthermore, the statistics of all feature vectors over each windowed signal are calculated for producing the final feature set. Finally, two well-known classifiers, a k-nearest neighbour classifier and a support vector machine classifier, are used for classification. In our experiments, we use two different datasets from Queensland, Australia (18 frog species from commercial recordings and field recordings of 8 frog species from James Cook University recordings). The weighted classification accuracy with our proposed method is 99.5% and 97.4% for 18 frog species and 8 frog species respectively, which outperforms all other comparable methods.
Resumo:
Food processing industry generates substantial high organic wastes along with high energy uses. The recovery of food processing wastes as renewable energy sources represents a sustainable option for the substitution of fossil energy, contributing to the transition of food sector towards a low-carbon economy. This article reviews the latest research progress on biofuel production using food processing wastes. While extensive work on laboratory and pilot-scale biosystems for energy production has been reported, this work presents a review of advances in metabolic pathways, key technical issues and bioengineering outcomes in biofuel production from food processing wastes. Research challenges and further prospects associated with the knowledge advances and technology development of biofuel production are discussed.
Resumo:
Concept inventory tests are one method to evaluate conceptual understanding and identify possible misconceptions. The multiple-choice question format, offering a choice between a correct selection and common misconceptions, can provide an assessment of students' conceptual understanding in various dimensions. Misconceptions of some engineering concepts exist due to a lack of mental frameworks, or schemas, for these types of concepts or conceptual areas. This study incorporated an open textual response component in a multiple-choice concept inventory test to capture written explanations of students' selections. The study's goal was to identify, through text analysis of student responses, the types and categorizations of concepts in these explanations that had not been uncovered by the distractor selections. The analysis of the textual explanations of a subset of the discrete-time signals and systems concept inventory questions revealed that students have difficulty conceptually explaining several dimensions of signal processing. This contributed to their inability to provide a clear explanation of the underlying concepts, such as mathematical concepts. The methods used in this study evaluate students' understanding of signals and systems concepts through their ability to express understanding in written text. This may present a bias for students with strong written communication skills. This study presents a framework for extracting and identifying the types of concepts students use to express their reasoning when answering conceptual questions.