4 resultados para Audio signals

em Duke University


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation focuses on two vital challenges in relation to whale acoustic signals: detection and classification.

In detection, we evaluated the influence of the uncertain ocean environment on the spectrogram-based detector, and derived the likelihood ratio of the proposed Short Time Fourier Transform detector. Experimental results showed that the proposed detector outperforms detectors based on the spectrogram. The proposed detector is more sensitive to environmental changes because it includes phase information.

In classification, our focus is on finding a robust and sparse representation of whale vocalizations. Because whale vocalizations can be modeled as polynomial phase signals, we can represent the whale calls by their polynomial phase coefficients. In this dissertation, we used the Weyl transform to capture chirp rate information, and used a two dimensional feature set to represent whale vocalizations globally. Experimental results showed that our Weyl feature set outperforms chirplet coefficients and MFCC (Mel Frequency Cepstral Coefficients) when applied to our collected data.

Since whale vocalizations can be represented by polynomial phase coefficients, it is plausible that the signals lie on a manifold parameterized by these coefficients. We also studied the intrinsic structure of high dimensional whale data by exploiting its geometry. Experimental results showed that nonlinear mappings such as Laplacian Eigenmap and ISOMAP outperform linear mappings such as PCA and MDS, suggesting that the whale acoustic data is nonlinear.

We also explored deep learning algorithms on whale acoustic data. We built each layer as convolutions with either a PCA filter bank (PCANet) or a DCT filter bank (DCTNet). With the DCT filter bank, each layer has different a time-frequency scale representation, and from this, one can extract different physical information. Experimental results showed that our PCANet and DCTNet achieve high classification rate on the whale vocalization data set. The word error rate of the DCTNet feature is similar to the MFSC in speech recognition tasks, suggesting that the convolutional network is able to reveal acoustic content of speech signals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Once thought to be predominantly the domain of cortex, multisensory integration has now been found at numerous sub-cortical locations in the auditory pathway. Prominent ascending and descending connection within the pathway suggest that the system may utilize non-auditory activity to help filter incoming sounds as they first enter the ear. Active mechanisms in the periphery, particularly the outer hair cells (OHCs) of the cochlea and middle ear muscles (MEMs), are capable of modulating the sensitivity of other peripheral mechanisms involved in the transduction of sound into the system. Through indirect mechanical coupling of the OHCs and MEMs to the eardrum, motion of these mechanisms can be recorded as acoustic signals in the ear canal. Here, we utilize this recording technique to describe three different experiments that demonstrate novel multisensory interactions occurring at the level of the eardrum. 1) In the first experiment, measurements in humans and monkeys performing a saccadic eye movement task to visual targets indicate that the eardrum oscillates in conjunction with eye movements. The amplitude and phase of the eardrum movement, which we dub the Oscillatory Saccadic Eardrum Associated Response or OSEAR, depended on the direction and horizontal amplitude of the saccade and occurred in the absence of any externally delivered sounds. 2) For the second experiment, we use an audiovisual cueing task to demonstrate a dynamic change to pressure levels in the ear when a sound is expected versus when one is not. Specifically, we observe a drop in frequency power and variability from 0.1 to 4kHz around the time when the sound is expected to occur in contract to a slight increase in power at both lower and higher frequencies. 3) For the third experiment, we show that seeing a speaker say a syllable that is incongruent with the accompanying audio can alter the response patterns of the auditory periphery, particularly during the most relevant moments in the speech stream. These visually influenced changes may contribute to the altered percept of the speech sound. Collectively, we presume that these findings represent the combined effect of OHCs and MEMs acting in tandem in response to various non-auditory signals in order to manipulate the receptive properties of the auditory system. These influences may have a profound, and previously unrecognized, impact on how the auditory system processes sounds from initial sensory transduction all the way to perception and behavior. Moreover, we demonstrate that the entire auditory system is, fundamentally, a multisensory system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Repetitive Ca2+ transients in dendritic spines induce various forms of synaptic plasticity by transmitting information encoded in their frequency and amplitude. CaMKII plays a critical role in decoding these Ca2+ signals to initiate long-lasting synaptic plasticity. However, the properties of CaMKII that mediate Ca2+ decoding in spines remain elusive. Here, I measured CaMKII activity in spines using fast-framing two-photon fluorescence lifetime imaging. Following each repetitive Ca2+ elevations, CaMKII activity increased in a stepwise manner. This signal integration, at the time scale of seconds, critically depended on Thr286 phosphorylation. In the absence of Thr286 phosphorylation, only by increasing the frequency of repetitive Ca2+ elevations could high peak CaMKII activity or plasticity be induced. In addition, I measured the association between CaMKII and Ca2+/CaM during spine plasticity induction. Unlike CaMKII activity, association of Ca2+/CaM to CaMKII plateaued at the first Ca2+ elevation event. This result indicated that integration of Ca2+ signals was initiated by the binding of Ca2+/CaM and amplified by the subsequent increases in Thr286-phosphorylated form of CaMKII. Together, these findings demonstrate that CaMKII functions as a leaky integrator of repetitive Ca2+ signals during the induction of synaptic plasticity, and that Thr286 phosphorylation is critical for defining the frequencies of such integration.