28 resultados para codec
Resumo:
The growing heterogeneity of networks, devices and consumption conditions asks for flexible and adaptive video coding solutions. The compression power of the HEVC standard and the benefits of the distributed video coding paradigm allow designing novel scalable coding solutions with improved error robustness and low encoding complexity while still achieving competitive compression efficiency. In this context, this paper proposes a novel scalable video coding scheme using a HEVC Intra compliant base layer and a distributed coding approach in the enhancement layers (EL). This design inherits the HEVC compression efficiency while providing low encoding complexity at the enhancement layers. The temporal correlation is exploited at the decoder to create the EL side information (SI) residue, an estimation of the original residue. The EL encoder sends only the data that cannot be inferred at the decoder, thus exploiting the correlation between the original and SI residues; however, this correlation must be characterized with an accurate correlation model to obtain coding efficiency improvements. Therefore, this paper proposes a correlation modeling solution to be used at both encoder and decoder, without requiring a feedback channel. Experiments results confirm that the proposed scalable coding scheme has lower encoding complexity and provides BD-Rate savings up to 3.43% in comparison with the HEVC Intra scalable extension under development. © 2014 IEEE.
Resumo:
This report presents and evaluates a novel idea for scalable lossy colour image coding with Matching Pursuit (MP) performed in a transform domain. The benefits of the idea of MP performed in the transform domain are analysed in detail. The main contribution of this work is extending MP with wavelets to colour coding and proposing a coding method. We exploit correlations between image subbands after wavelet transformation in RGB colour space. Then, a new and simple quantisation and coding scheme of colour MP decomposition based on Run Length Encoding (RLE), inspired by the idea of coding indexes in relational databases, is applied. As a final coding step arithmetic coding is used assuming uniform distributions of MP atom parameters. The target application is compression at low and medium bit-rates. Coding performance is compared to JPEG 2000 showing the potential to outperform the latter with more sophisticated than uniform data models for arithmetic coder. The results are presented for grayscale and colour coding of 12 standard test images.
Resumo:
Today, most conventional surveillance networks are based on analog system, which has a lot of constraints like manpower and high-bandwidth requirements. It becomes the barrier for today's surveillance network development. This dissertation describes a digital surveillance network architecture based on the H.264 coding/decoding (CODEC) System-on-a-Chip (SoC) platform. The proposed digital surveillance network architecture includes three major layers: software layer, hardware layer, and the network layer. The following outlines the contributions to the proposed digital surveillance network architecture. (1) We implement an object recognition system and an object categorization system on the software layer by applying several Digital Image Processing (DIP) algorithms. (2) For better compression ratio and higher video quality transfer, we implement two new modules on the hardware layer of the H.264 CODEC core, i.e., the background elimination module and the Directional Discrete Cosine Transform (DDCT) module. (3) Furthermore, we introduce a Digital Signal Processor (DSP) sub-system on the main bus of H.264 SoC platforms as the major hardware support system for our software architecture. Thus we combine the software and hardware platforms to be an intelligent surveillance node. Lab results show that the proposed surveillance node can dramatically save the network resources like bandwidth and storage capacity.
Resumo:
High performance video codec is mandatory for multimedia applications such as video-on-demand and video conferencing. Recent research has proposed numerous video coding techniques to meet the requirement in bandwidth, delay, loss and Quality-of-Service (QoS). In this paper, we present our investigations on inter-subband self-similarity within the wavelet-decomposed video frames using neural networks, and study the performance of applying the spatial network model to all video frames over time. The goal of our proposed method is to restore the highest perceptual quality for video transmitted over a highly congested network. Our contributions in this paper are: (1) A new coding model with neural network based, inter-subband redundancy (ISR) prediction for video coding using wavelet (2) The performance of 1D and 2D ISR prediction, including multiple levels of wavelet decompositions. Our result shows a short-term quality enhancement may be obtained using both 1D and 2D ISR prediction.
Resumo:
Os programas de gravação e edição de áudio em ambientes multi-faixa são populares entre os músicos, para desenvolverem o seu trabalho. Estes programas apresentam funcionalidades de gravação e edição, mas não promovem o trabalho colaborativo entre músicos. De forma a colaborar, os vários elementos de uma banda musical têm de se reunir no mesmo local físico. Com este trabalho pretende-se criar uma solução para a colaboração no contexto da gravação e edição de áudio. Tem-se como objectivo o desenvolvimento de uma aplicação distribuída que facilite a gravação e edição de áudio, estando os elementos de cada banda musical em localizações físicas distintas. A aplicação desenvolvida tem funcionalidades de manipulação de áudio, bem como mecanismos para a sincronização do trabalho entre os vários elementos da banda. A manipulação de áudio consiste em reprodução, gravação, codificação e edição de áudio. O áudio é manipulado no formato Microsoft WAV, resultante da digitalização do áudio em Pulse Code Modulation (PCM) e posteriormente codificado em FLAC (Free Lossless Audio Codec) ou MP3 (Mpeg-1 Layer 3) de forma a minimizar a dimensão do ficheiro, diminuindo assim o espaço que ocupa em disco e a largura de banda necessária à sua transmissão pela internet. A edição consiste na aplicação de operações como amplificação, ecos, entre outros. Os elementos da banda instalam no seu computador a aplicação cliente, com interface gráfica onde desenvolvem o seu trabalho. Esta aplicação cliente mantém a lógica de sincronização do trabalho colaborativo, inserindo-se como um dos peers da arquitectura peer-to-peer híbrida da aplicação distribuída. Estes peers comunicam entre si, enviando informação acerca das operações aplicadas e áudio gravado pelos membros da banda.
Resumo:
Wyner-Ziv (WZ) video coding is a particular case of distributed video coding, the recent video coding paradigm based on the Slepian-Wolf and Wyner-Ziv theorems that exploits the source correlation at the decoder and not at the encoder as in predictive video coding. Although many improvements have been done over the last years, the performance of the state-of-the-art WZ video codecs still did not reach the performance of state-of-the-art predictive video codecs, especially for high and complex motion video content. This is also true in terms of subjective image quality mainly because of a considerable amount of blocking artefacts present in the decoded WZ video frames. This paper proposes an adaptive deblocking filter to improve both the subjective and objective qualities of the WZ frames in a transform domain WZ video codec. The proposed filter is an adaptation of the advanced deblocking filter defined in the H.264/AVC (advanced video coding) standard to a WZ video codec. The results obtained confirm the subjective quality improvement and objective quality gains that can go up to 0.63 dB in the overall for sequences with high motion content when large group of pictures are used.
Resumo:
Wyner - Ziv (WZ) video coding is a particular case of distributed video coding (DVC), the recent video coding paradigm based on the Slepian - Wolf and Wyner - Ziv theorems which exploits the source temporal correlation at the decoder and not at the encoder as in predictive video coding. Although some progress has been made in the last years, WZ video coding is still far from the compression performance of predictive video coding, especially for high and complex motion contents. The WZ video codec adopted in this study is based on a transform domain WZ video coding architecture with feedback channel-driven rate control, whose modules have been improved with some recent coding tools. This study proposes a novel motion learning approach to successively improve the rate-distortion (RD) performance of the WZ video codec as the decoding proceeds, making use of the already decoded transform bands to improve the decoding process for the remaining transform bands. The results obtained reveal gains up to 2.3 dB in the RD curves against the performance for the same codec without the proposed motion learning approach for high motion sequences and long group of pictures (GOP) sizes.
Resumo:
A recente norma IEEE 802.11n oferece um elevado débito em redes locais sem fios sendo por isso esperado uma adopção massiva desta tecnologia substituindo progressivamente as redes 802.11b/g. Devido à sua elevada capacidade esta recente geração de redes sem fios 802.11n permite um crescimento acentuado de serviços audiovisuais. Neste contexto esta dissertação procura estudar a rede 802.11n, caracterizando o desempenho e a qualidade associada a um serviço de transmissão de vídeo, recorrendo para o efeito a uma arquitectura de simulação da rede 802.11n. Desta forma é caracterizado o impacto das novas funcionalidades da camada MAC introduzidas na norma 801.11n, como é o caso da agregação A-MSDU e A-MPDU, bem como o impacto das novas funcionalidades da camada física como é o caso do MIMO; em ambos os casos uma optimização da parametrização é realizada. Também se verifica que as principais técnicas de codificação de vídeo H.264/AVC para optimizar o processo de distribuição de vídeo, permitem optimizar o desempenho global do sistema de transmissão. Aliando a optimização e parametrização da camada MAC, da camada física, e do processo de codificação, é possível propor um conjunto de configurações que permitem obter o melhor desempenho na qualidade de serviço da transmissão de conteúdos de vídeo numa rede 802.11n. A arquitectura de simulação construída nesta dissertação é especificamente adaptada para suportar as técnicas de agregação da camada MAC, bem como para suportar o encapsulamento em protocolos de rede que permitem a transmissão dos pacotes de vídeo RTP, codificados em H.264/AVC.
Resumo:
Recently, several distributed video coding (DVC) solutions based on the distributed source coding (DSC) paradigm have appeared in the literature. Wyner-Ziv (WZ) video coding, a particular case of DVC where side information is made available at the decoder, enable to achieve a flexible distribution of the computational complexity between the encoder and decoder, promising to fulfill novel requirements from applications such as video surveillance, sensor networks and mobile camera phones. The quality of the side information at the decoder has a critical role in determining the WZ video coding rate-distortion (RD) performance, notably to raise it to a level as close as possible to the RD performance of standard predictive video coding schemes. Towards this target, efficient motion search algorithms for powerful frame interpolation are much needed at the decoder. In this paper, the RD performance of a Wyner-Ziv video codec is improved by using novel, advanced motion compensated frame interpolation techniques to generate the side information. The development of these type of side information estimators is a difficult problem in WZ video coding, especially because the decoder only has available some reference, decoded frames. Based on the regularization of the motion field, novel side information creation techniques are proposed in this paper along with a new frame interpolation framework able to generate higher quality side information at the decoder. To illustrate the RD performance improvements, this novel side information creation framework has been integrated in a transform domain turbo coding based Wyner-Ziv video codec. Experimental results show that the novel side information creation solution leads to better RD performance than available state-of-the-art side information estimators, with improvements up to 2 dB: moreover, it allows outperforming H.264/AVC Intra by up to 3 dB with a lower encoding complexity.
Resumo:
Video coding technologies have played a major role in the explosion of large market digital video applications and services. In this context, the very popular MPEG-x and H-26x video coding standards adopted a predictive coding paradigm, where complex encoders exploit the data redundancy and irrelevancy to 'control' much simpler decoders. This codec paradigm fits well applications and services such as digital television and video storage where the decoder complexity is critical, but does not match well the requirements of emerging applications such as visual sensor networks where the encoder complexity is more critical. The Slepian Wolf and Wyner-Ziv theorems brought the possibility to develop the so-called Wyner-Ziv video codecs, following a different coding paradigm where it is the task of the decoder, and not anymore of the encoder, to (fully or partly) exploit the video redundancy. Theoretically, Wyner-Ziv video coding does not incur in any compression performance penalty regarding the more traditional predictive coding paradigm (at least for certain conditions). In the context of Wyner-Ziv video codecs, the so-called side information, which is a decoder estimate of the original frame to code, plays a critical role in the overall compression performance. For this reason, much research effort has been invested in the past decade to develop increasingly more efficient side information creation methods. This paper has the main objective to review and evaluate the available side information methods after proposing a classification taxonomy to guide this review, allowing to achieve more solid conclusions and better identify the next relevant research challenges. After classifying the side information creation methods into four classes, notably guess, try, hint and learn, the review of the most important techniques in each class and the evaluation of some of them leads to the important conclusion that the side information creation methods provide better rate-distortion (RD) performance depending on the amount of temporal correlation in each video sequence. It became also clear that the best available Wyner-Ziv video coding solutions are almost systematically based on the learn approach. The best solutions are already able to systematically outperform the H.264/AVC Intra, and also the H.264/AVC zero-motion standard solutions for specific types of content. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
In video communication systems, the video signals are typically compressed and sent to the decoder through an error-prone transmission channel that may corrupt the compressed signal, causing the degradation of the final decoded video quality. In this context, it is possible to enhance the error resilience of typical predictive video coding schemes using as inspiration principles and tools from an alternative video coding approach, the so-called Distributed Video Coding (DVC), based on the Distributed Source Coding (DSC) theory. Further improvements in the decoded video quality after error-prone transmission may also be obtained by considering the perceptual relevance of the video content, as distortions occurring in different regions of a picture have a different impact on the user's final experience. In this context, this paper proposes a Perceptually Driven Error Protection (PDEP) video coding solution that enhances the error resilience of a state-of-the-art H.264/AVC predictive video codec using DSC principles and perceptual considerations. To increase the H.264/AVC error resilience performance, the main technical novelties brought by the proposed video coding solution are: (i) design of an improved compressed domain perceptual classification mechanism; (ii) design of an improved transcoding tool for the DSC-based protection mechanism; and (iii) integration of a perceptual classification mechanism in an H.264/AVC compliant codec with a DSC-based error protection mechanism. The performance results obtained show that the proposed PDEP video codec provides a better performing alternative to traditional error protection video coding schemes, notably Forward Error Correction (FEC)-based schemes. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Desenvolupament d'una aplicació pensada per ser accedida en forma d'aplicatiu web que permeti gestionar gravacions de llarga durada d'imatges i so provinents d'una càmera, fent servir el còdec h.264 per al vídeo i emmagatzemant-lo en un contenidor mp4.
Resumo:
Esta dissertação descreve os resultados das medições observadas em um dos Laboratórios de uma Operadora de Telecomunicações (LOP), onde foram avaliados e analisados alguns requisitos de QoS em redes de pacotes IP (Internet Protocol). Essas medições foram feitas no âmbito do objetivo desta dissertação que é avaliar formas de prover serviços VoIP (Voice over Internet Protocol) em redes de pacotes conforme a recomendação do padrão FRF.12. Essa rede é assim, uma rede de link de 512kbps que também provê serviços VoIP compartilhados, concorrentemente com dados e serviços multimídia. Dos ítens analisados destacam-se: Análise de Codecs; QoS (Quality Of Service) Diffserv; Compressão de cabeçalho RTP (Real Time Protocol) - cRTP; Fragmentação com intercalação - LFI; Comportamento da Rede em situações diversas; a adequação do software free Multi Generator (MGEN) de geração - medição - coleta de dados, em redes. A análise foi, essencialmente, em enlace Frame Relay nos CPE (Customer Premise Equipment), passando pelo Backbone IP VPN / MPLS Multicast, pois o Frame Relay Fórum v12 (FRF.12) dá suporte à intercalação de voz entre os pacotes de dados. O FRF.12 é indispensável, pois este esta dissertação tem como objetivo realizar um conjuntos de testes e medidas que avaliam a aplicação dos serviços VoIP em links de baixa capacidades com trafego de dados compartilhados. Para oferecer esse serviço e de qualidade é necessário fragmentar e intercalar frames de voz entre os pacotes de dados usando o FRF.12. Depois do estudo teórico das recomendações, normas de padronização internacional e dos fabricantes, foram realizados testes que resultam na validação prática de toda a teoria outrora analisada através de testes específicos que comprovam em definitivo a viabilidade das aplicações VoIP em uma rede de enlace de baixa velocidade. Feitos esses testes chegou-se a conclusão de que em determinados casos não se revela necessário nem preocupante o aumento da banda para se puder prover determinados serviços. Na sequência dos testes foram também avaliados o desempenho, a ocupação da banda e a eficácia dos equipamentos - softwares. Da bancada dos testes e medições, provou-se o seguinte: que de fato consegue-se melhor otimização da banda ao realizar compressão do cabeçalho cRTP; que de fato a fragmentação de pacote FTP (File Transfer Protocol) com intercalação de pacotes VoIP faz reduzir o delay e jitter1 para as aplicações de tempo real; que de fato a habilitação de QoS Intserv provê classificação e faz diferenciação dos tráfegos, e que o CODEC G729 apresenta melhor adequação em lidar com aplicações VoIP em routers2 CISCO, disponível em CRT (Centro de referência Tecnológica) de uma LOP.
Resumo:
The creation of OFDM based Wireless Personal Area Networks (WPANs) has allowed the development of high bit-rate wireless communication devices suitable for streaming High Definition video between consumer products, as demonstrated in Wireless-USB and Wireless-HDMI. However, these devices need high frequency clock rates, particularly for the OFDM, FFT and symbol processing sections resulting in high silicon cost and high electrical power. The high clock rates make hardware prototyping difficult and verification is therefore very important but costly. Acknowledging that electrical power in wireless consumer devices is more critical than the number of implemented logic gates, this paper presents a Double Data Rate (DDR) architecture for implementation inside a OFDM baseband codec in order to reduce the high frequency clock rates by a complete factor of 2. The presented architecture has been implemented and tested for ECMA-368 (Wireless- USB context) resulting in a maximum clock rate of 264MHz instead of the expected 528MHz clock rate existing anywhere on the baseband codec die.
Resumo:
Denna undersökning söker ett svar på hur den relativt vana lyssnarens bedömning av ljudkvalitet påverkas av ett så kallat öppet test, där det som bedöms är känd för lyssnaren, jämfört med ett blindtest, där detta objekt är okänt. Frågan appliceras på kvalitetsbedömningen av digitala kodningstekniker, d.v.s. hur lyssnaren påverkas av att valet av kodningsteknik som avlyssnas är känd eller inte. För att ta reda på detta genomfördes ett lyssningstest med nio deltagare. Deltagarna fick betygssätta perceptuellt kodade ljudfiler mot en känd referens, både som ett blindtest samt i ett öppet test. Resultatet är mångtydigt och inga generella slutsatser för hur lyssnaren påverkas av ett öppet test jämfört med ett blindtest går att uppfatta. Resultatet visar dock att påverkan ett öppet test har på lyssnarens bedömning är högst individuell. Lyssningstest i form av blindtest bör därför användas för att uppnå pålitligast resultat.