928 resultados para audio coding
Resumo:
I thank George Pandarakalam for research assistance; Hans-Jörg Rheinberger for hosting my stay at the Max Planck Institute for History of Science, Berlin; and Sahotra Sarkar and referees of this journal for offering detailed comments. Funded by the Wellcome Trust (WT098764MA).
Resumo:
This dissertation presents a study and experimental research on asymmetric coding of stereoscopic video. A review on 3D technologies, video formats and coding is rst presented and then particular emphasis is given to asymmetric coding of 3D content and performance evaluation methods, based on subjective measures, of methods using asymmetric coding. The research objective was de ned to be an extension of the current concept of asymmetric coding for stereo video. To achieve this objective the rst step consists in de ning regions in the spatial dimension of auxiliary view with di erent perceptual relevance within the stereo pair, which are identi ed by a binary mask. Then these regions are encoded with better quality (lower quantisation) for the most relevant ones and worse quality (higher quantisation) for the those with lower perceptual relevance. The actual estimation of the relevance of a given region is based on a measure of disparity according to the absolute di erence between views. To allow encoding of a stereo sequence using this method, a reference H.264/MVC encoder (JM) has been modi ed to allow additional con guration parameters and inputs. The nal encoder is still standard compliant. In order to show the viability of the method subjective assessment tests were performed over a wide range of objective qualities of the auxiliary view. The results of these tests allow us to prove 3 main goals. First, it is shown that the proposed method can be more e cient than traditional asymmetric coding when encoding stereo video at higher qualities/rates. The method can also be used to extend the threshold at which uniform asymmetric coding methods start to have an impact on the subjective quality perceived by the observers. Finally the issue of eye dominance is addressed. Results from stereo still images displayed over a short period of time showed it has little or no impact on the proposed method.
Resumo:
This paper will look at the benefits and limitations of content distribution using Forward Error Correction (FEC) in conjunction with the Transmission Control Protocol (TCP). FEC can be used to reduce the number of retransmissions which would usually result from a lost packet. The requirement for TCP to deal with any losses is then greatly reduced. There are however side-effects to using FEC as a countermeasure to packet loss: an additional requirement for bandwidth. When applications such as real-time video conferencing are needed, delay must be kept to a minimum, and retransmissions are certainly not desirable. A balance, therefore, between additional bandwidth and delay due to retransmissions must be struck. Our results show that the throughput of data can be significantly improved when packet loss occurs using a combination of FEC and TCP, compared to relying solely on TCP for retransmissions. Furthermore, a case study applies the result to demonstrate the achievable improvements in the quality of streaming video perceived by end users.
Resumo:
Erasure control coding has been exploited in communication networks with an aim to improve the end-to-end performance of data delivery across the network. To address the concerns over the strengths and constraints of erasure coding schemes in this application, we examine the performance limits of two erasure control coding strategies, forward erasure recovery and adaptive erasure recovery. Our investigation shows that the throughput of a network using an (n, k) forward erasure control code is capped by r =k/n when the packet loss rate p ≤ (te/n) and by k(l-p)/(n-te) when p > (t e/n), where te is the erasure control capability of the code. It also shows that the lower bound of the residual loss rate of such a network is (np-te)/(n-te) for (te/n) < p ≤ 1. Especially, if the code used is maximum distance separable, the Shannon capacity of the erasure channel, i.e. 1-p, can be achieved and the residual loss rate is lower bounded by (p+r-1)/r, for (1-r) < p ≤ 1. To address the requirements in real-time applications, we also investigate the service completion time of different schemes. It is revealed that the latency of the forward erasure recovery scheme is fractionally higher than that of the scheme without erasure control coding or retransmission mechanisms (using UDP), but much lower than that of the adaptive erasure scheme when the packet loss rate is high. Results on comparisons between the two erasure control schemes exhibit their advantages as well as disadvantages in the role of delivering end-to-end services. To show the impact of the bounds derived on the end-to-end performance of a TCP/IP network, a case study is provided to demonstrate how erasure control coding could be used to maximize the performance of practical systems. © 2010 IEEE.
Resumo:
We quantify the error statistics and patterning effects in a 5x 40 Gbit/s WDM RZ-DBPSK SMF/DCF fibre link using hybrid Raman/EDFA amplification. We propose an adaptive constrained coding for the suppression of errors due to patterning effects. It is established, that this coding technique can greatly reduce the bit error rate (BER) value even for large BER (BER > 101). The proposed approach can be used in the combination with the forward error correction schemes (FEC) to correct the errors even when real channel BER is outside the FEC workspace.
Resumo:
The roles of long non-coding RNAs (lncRNAs) in regulating cancer and stem cells are being increasingly appreciated. Its diverse mechanisms provide the regulatory network with a bigger repertoire to increase complexity. Here we report a novel LncRNA, Lnc34a, that is enriched in colon cancer stem cells (CCSCs) and initiates asymmetric division by directly targeting the microRNA miR-34a to cause its spatial imbalance. Lnc34a recruits Dnmt3a via PHB2 and HDAC1 to methylate and deacetylate the miR-34a promoter simultaneously, hence epigenetically silencing miR-34a expression independent of its upstream regulator, p53. Lnc34a levels affect CCSC self-renewal and colorectal cancer (CRC) growth in xenograft models. Lnc34a is upregulated in late-stage CRCs, contributing to epigenetic miR-34a silencing and CRC proliferation. The fact that lncRNA targets microRNA highlights the regulatory complexity of non-coding RNAs (ncRNAs), which occupy the bulk of the genome.
Resumo:
A large proportion of the variation in traits between individuals can be attributed to variation in the nucleotide sequence of the genome. The most commonly studied traits in human genetics are related to disease and disease susceptibility. Although scientists have identified genetic causes for over 4,000 monogenic diseases, the underlying mechanisms of many highly prevalent multifactorial inheritance disorders such as diabetes, obesity, and cardiovascular disease remain largely unknown. Identifying genetic mechanisms for complex traits has been challenging because most of the variants are located outside of protein-coding regions, and determining the effects of such non-coding variants remains difficult. In this dissertation, I evaluate the hypothesis that such non-coding variants contribute to human traits and diseases by altering the regulation of genes rather than the sequence of those genes. I will specifically focus on studies to determine the functional impacts of genetic variation associated with two related complex traits: gestational hyperglycemia and fetal adiposity. At the genomic locus associated with maternal hyperglycemia, we found that genetic variation in regulatory elements altered the expression of the HKDC1 gene. Furthermore, we demonstrated that HKDC1 phosphorylates glucose in vitro and in vivo, thus demonstrating that HKDC1 is a fifth human hexokinase gene. At the fetal-adiposity associated locus, we identified variants that likely alter VEPH1 expression in preadipocytes during differentiation. To make such studies of regulatory variation high-throughput and routine, we developed POP-STARR, a novel high throughput reporter assay that can empirically measure the effects of regulatory variants directly from patient DNA. By combining targeted genome capture technologies with STARR-seq, we assayed thousands of haplotypes from 760 individuals in a single experiment. We subsequently used POP-STARR to identify three key features of regulatory variants: that regulatory variants typically have weak effects on gene expression; that the effects of regulatory variants are often coordinated with respect to disease-risk, suggesting a general mechanism by which the weak effects can together have phenotypic impact; and that nucleotide transversions have larger impacts on enhancer activity than transitions. Together, the findings presented here demonstrate successful strategies for determining the regulatory mechanisms underlying genetic associations with human traits and diseases, and value of doing so for driving novel biological discovery.
Resumo:
'Image volumes' refer to realizations of images in other dimensions such as time, spectrum, and focus. Recent advances in scientific, medical, and consumer applications demand improvements in image volume capture. Though image volume acquisition continues to advance, it maintains the same sampling mechanisms that have been used for decades; every voxel must be scanned and is presumed independent of its neighbors. Under these conditions, improving performance comes at the cost of increased system complexity, data rates, and power consumption.
This dissertation explores systems and methods capable of efficiently improving sensitivity and performance for image volume cameras, and specifically proposes several sampling strategies that utilize temporal coding to improve imaging system performance and enhance our awareness for a variety of dynamic applications.
Video cameras and camcorders sample the video volume (x,y,t) at fixed intervals to gain understanding of the volume's temporal evolution. Conventionally, one must reduce the spatial resolution to increase the framerate of such cameras. Using temporal coding via physical translation of an optical element known as a coded aperture, the compressive temporal imaging (CACTI) camera emonstrates a method which which to embed the temporal dimension of the video volume into spatial (x,y) measurements, thereby greatly improving temporal resolution with minimal loss of spatial resolution. This technique, which is among a family of compressive sampling strategies developed at Duke University, temporally codes the exposure readout functions at the pixel level.
Since video cameras nominally integrate the remaining image volume dimensions (e.g. spectrum and focus) at capture time, spectral (x,y,t,\lambda) and focal (x,y,t,z) image volumes are traditionally captured via sequential changes to the spectral and focal state of the system, respectively. The CACTI camera's ability to embed video volumes into images leads to exploration of other information within that video; namely, focal and spectral information. The next part of the thesis demonstrates derivative works of CACTI: compressive extended depth of field and compressive spectral-temporal imaging. These works successfully show the technique's extension of temporal coding to improve sensing performance in these other dimensions.
Geometrical optics-related tradeoffs, such as the classic challenges of wide-field-of-view and high resolution photography, have motivated the development of mulitscale camera arrays. The advent of such designs less than a decade ago heralds a new era of research- and engineering-related challenges. One significant challenge is that of managing the focal volume (x,y,z) over wide fields of view and resolutions. The fourth chapter shows advances on focus and image quality assessment for a class of multiscale gigapixel cameras developed at Duke.
Along the same line of work, we have explored methods for dynamic and adaptive addressing of focus via point spread function engineering. We demonstrate another form of temporal coding in the form of physical translation of the image plane from its nominal focal position. We demonstrate this technique's capability to generate arbitrary point spread functions.
Resumo:
HomeBank is introduced here. It is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments. HomeBank serves two primary purposes. First, it is a repository for raw audio and associated files: one database requires special permissions, and another redacted database allows unrestricted public access. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but various recordings and metadata can be accommodated. The HomeBank database can have both vetted and unvetted recordings, with different levels of accessibility. Additionally, HomeBank is an open repository for processing and analysis tools for HomeBank or similar data sets. HomeBank is flexible for users and contributors, making primary data available to researchers, especially those in child development, linguistics, and audio engineering. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders. Automated processing applied to daylong home audio recordings is now becoming widely used in early intervention initiatives, helping parents to provide richer speech input to at-risk children.
Resumo:
OBJECTIVES: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. DESIGN: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. RESULTS: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. CONCLUSIONS: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.
Resumo:
Backscatter communication is an emerging wireless technology that recently has gained an increase in attention from both academic and industry circles. The key innovation of the technology is the ability of ultra-low power devices to utilize nearby existing radio signals to communicate. As there is no need to generate their own energetic radio signal, the devices can benefit from a simple design, are very inexpensive and are extremely energy efficient compared with traditional wireless communication. These benefits have made backscatter communication a desirable candidate for distributed wireless sensor network applications with energy constraints.
The backscatter channel presents a unique set of challenges. Unlike a conventional one-way communication (in which the information source is also the energy source), the backscatter channel experiences strong self-interference and spread Doppler clutter that mask the information-bearing (modulated) signal scattered from the device. Both of these sources of interference arise from the scattering of the transmitted signal off of objects, both stationary and moving, in the environment. Additionally, the measurement of the location of the backscatter device is negatively affected by both the clutter and the modulation of the signal return.
This work proposes a channel coding framework for the backscatter channel consisting of a bi-static transmitter/receiver pair and a quasi-cooperative transponder. It proposes to use run-length limited coding to mitigate the background self-interference and spread-Doppler clutter with only a small decrease in communication rate. The proposed method applies to both binary phase-shift keying (BPSK) and quadrature-amplitude modulation (QAM) scheme and provides an increase in rate by up to a factor of two compared with previous methods.
Additionally, this work analyzes the use of frequency modulation and bi-phase waveform coding for the transmitted (interrogating) waveform for high precision range estimation of the transponder location. Compared to previous methods, optimal lower range sidelobes are achieved. Moreover, since both the transmitted (interrogating) waveform coding and transponder communication coding result in instantaneous phase modulation of the signal, cross-interference between localization and communication tasks exists. Phase discriminating algorithm is proposed to make it possible to separate the waveform coding from the communication coding, upon reception, and achieve localization with increased signal energy by up to 3 dB compared with previous reported results.
The joint communication-localization framework also enables a low-complexity receiver design because the same radio is used both for localization and communication.
Simulations comparing the performance of different codes corroborate the theoretical results and offer possible trade-off between information rate and clutter mitigation as well as a trade-off between choice of waveform-channel coding pairs. Experimental results from a brass-board microwave system in an indoor environment are also presented and discussed.
Resumo:
This work focuses on the construction and application of coded apertures to compressive X-ray tomography. Coded apertures can be made in a number of ways, each method having an impact on system background and signal contrast. Methods of constructing coded apertures for structuring X-ray illumination and scatter are compared and analyzed. Apertures can create structured X-ray bundles that investigate specific sets of object voxels. The tailored bundles of rays form a code (or pattern) and are later estimated through computational inversion. Structured illumination can be used to subsample object voxels and make inversion feasible for low dose computed tomography (CT) systems, or it can be used to reduce background in limited angle CT systems.
On the detection side, coded apertures modulate X-ray scatter signals to determine the position and radiance of scatter points. By forming object dependent projections in measurement space, coded apertures multiplex modulated scatter signals onto a detector. The multiplexed signals can be inverted with knowledge of the code pattern and system geometry. This work shows two systems capable of determining object position and type in a 2D plane, by illuminating objects with an X-ray `fan beam,' using coded apertures and compressive measurements. Scatter tomography can help identify materials in security and medicine that may be ambiguous with transmission tomography alone.
Resumo:
The VRAG-R is designed to assess the likelihood of violent or sexual reoffending among male offenders. The data set comprises demographic, criminal history, psychological assessment, and psychiatric information about the offenders gathered from institutional files together with post-release recidivism information. The VRAG-R is a twelve-item actuarial instrument and the scores on these items form part of the data set. Because one of the goals of the VRAG-R development project was to compare the VRAG-R to the VRAG, subjects' VRAG scores are included in this data set. Access to the VRAG-R dataset is restricted. Contact Data Services, Queen's University Library (academic.services@queensu.ca) for access.
Resumo:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.