26 resultados para CODECs
Resumo:
A new high throughput and scalable architecture for unified transform coding in H.264/AVC is proposed in this paper. Such flexible structure is capable of computing all the 4x4 and 2x2 transforms for Ultra High Definition Video (UHDV) applications (4320x7680@ 30fps) in real-time and with low hardware cost. These significantly high performance levels were proven with the implementation of several different configurations of the proposed structure using both FPGA and ASIC 90 nm technologies. In addition, such experimental evaluation also demonstrated the high area efficiency of theproposed architecture, which in terms of Data Throughput per Unit of Area (DTUA) is at least 1.5 times more efficient than its more prominent related designs(1).
Resumo:
A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4 integer DCTs and the 4×4 and 2×2 Hadamard transforms defined in the H.264/AVC standard, but also the 4×4, 8×8, 16×16 and 32×32 integer transforms adopted in HEVC. Experimental results obtained using a Xilinx Virtex-7 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which outperforms its more prominent related designs by at least 1.8 times. When integrated in a multi-core embedded system, this architecture allows the computation, in real-time, of all the transforms mentioned above for resolutions as high as the 8k Ultra High Definition Television (UHDTV) (7680×4320 @ 30fps).
Resumo:
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e. g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 x 4,320 at 30 fps) in real time.
Resumo:
The main objective is to analyze the performance of some codecs supported by Asterisk with and without encryption using RTP and SRTP, respectively, providing important data for decision-making in the implementation of a VoIP system with Asterisk. Thus, it is possible to realize both codecs as the protocol can be chosen depending on the application, or the system's main feature is the speed packet switching, security level or lower tolerance for unsuccessful calls. For this, tests were made with the codec with and without the use of cryptography to obtain some findings on the use of the same, giving more attention to the response time for the start of a call.
Resumo:
A highly parallel and scalable Deblocking Filter (DF) hardware architecture for H.264/AVC and SVC video codecs is presented in this paper. The proposed architecture mainly consists on a coarse grain systolic array obtained by replicating a unique and homogeneous Functional Unit (FU), in which a whole Deblocking-Filter unit is implemented. The proposal is also based on a novel macroblock-level parallelization strategy of the filtering algorithm which improves the final performance by exploiting specific data dependences. This way communication overhead is reduced and a more intensive parallelism in comparison with the existing state-of-the-art solutions is obtained. Furthermore, the architecture is completely flexible, since the level of parallelism can be changed, according to the application requirements. The design has been implemented in a Virtex-5 FPGA, and it allows filtering 4CIF (704 × 576 pixels @30 fps) video sequences in real-time at frequencies lower than 10.16 Mhz.
Resumo:
Systems relying on fixed hardware components with a static level of parallelism can suffer from an underuse of logical resources, since they have to be designed for the worst-case scenario. This problem is especially important in video applications due to the emergence of new flexible standards, like Scalable Video Coding (SVC), which offer several levels of scalability. In this paper, Dynamic and Partial Reconfiguration (DPR) of modern FPGAs is used to achieve run-time variable parallelism, by using scalable architectures where the size can be adapted at run-time. Based on this proposal, a scalable Deblocking Filter core (DF), compliant with the H.264/AVC and SVC standards has been designed. This scalable DF allows run-time addition or removal of computational units working in parallel. Scalability is offered together with a scalable parallelization strategy at the macroblock (MB) level, such that when the size of the architecture changes, MB filtering order is modified accordingly
Resumo:
One of the most efficient approaches to generate the side information (SI) in distributed video codecs is through motion compensated frame interpolation where the current frame is estimated based on past and future reference frames. However, this approach leads to significant spatial and temporal variations in the correlation noise between the source at the encoder and the SI at the decoder. In such scenario, it would be useful to design an architecture where the SI can be more robustly generated at the block level, avoiding the creation of SI frame regions with lower correlation, largely responsible for some coding efficiency losses. In this paper, a flexible framework to generate SI at the block level in two modes is presented: while the first mode corresponds to a motion compensated interpolation (MCI) technique, the second mode corresponds to a motion compensated quality enhancement (MCQE) technique where a low quality Intra block sent by the encoder is used to generate the SI by doing motion estimation with the help of the reference frames. The novel MCQE mode can be overall advantageous from the rate-distortion point of view, even if some rate has to be invested in the low quality Intra coding blocks, for blocks where the MCI produces SI with lower correlation. The overall solution is evaluated in terms of RD performance with improvements up to 2 dB, especially for high motion video sequences and long Group of Pictures (GOP) sizes.
Resumo:
Wyner-Ziv (WZ) video coding is a particular case of distributed video coding, the recent video coding paradigm based on the Slepian-Wolf and Wyner-Ziv theorems that exploits the source correlation at the decoder and not at the encoder as in predictive video coding. Although many improvements have been done over the last years, the performance of the state-of-the-art WZ video codecs still did not reach the performance of state-of-the-art predictive video codecs, especially for high and complex motion video content. This is also true in terms of subjective image quality mainly because of a considerable amount of blocking artefacts present in the decoded WZ video frames. This paper proposes an adaptive deblocking filter to improve both the subjective and objective qualities of the WZ frames in a transform domain WZ video codec. The proposed filter is an adaptation of the advanced deblocking filter defined in the H.264/AVC (advanced video coding) standard to a WZ video codec. The results obtained confirm the subjective quality improvement and objective quality gains that can go up to 0.63 dB in the overall for sequences with high motion content when large group of pictures are used.
Resumo:
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).
Resumo:
Video coding technologies have played a major role in the explosion of large market digital video applications and services. In this context, the very popular MPEG-x and H-26x video coding standards adopted a predictive coding paradigm, where complex encoders exploit the data redundancy and irrelevancy to 'control' much simpler decoders. This codec paradigm fits well applications and services such as digital television and video storage where the decoder complexity is critical, but does not match well the requirements of emerging applications such as visual sensor networks where the encoder complexity is more critical. The Slepian Wolf and Wyner-Ziv theorems brought the possibility to develop the so-called Wyner-Ziv video codecs, following a different coding paradigm where it is the task of the decoder, and not anymore of the encoder, to (fully or partly) exploit the video redundancy. Theoretically, Wyner-Ziv video coding does not incur in any compression performance penalty regarding the more traditional predictive coding paradigm (at least for certain conditions). In the context of Wyner-Ziv video codecs, the so-called side information, which is a decoder estimate of the original frame to code, plays a critical role in the overall compression performance. For this reason, much research effort has been invested in the past decade to develop increasingly more efficient side information creation methods. This paper has the main objective to review and evaluate the available side information methods after proposing a classification taxonomy to guide this review, allowing to achieve more solid conclusions and better identify the next relevant research challenges. After classifying the side information creation methods into four classes, notably guess, try, hint and learn, the review of the most important techniques in each class and the evaluation of some of them leads to the important conclusion that the side information creation methods provide better rate-distortion (RD) performance depending on the amount of temporal correlation in each video sequence. It became also clear that the best available Wyner-Ziv video coding solutions are almost systematically based on the learn approach. The best solutions are already able to systematically outperform the H.264/AVC Intra, and also the H.264/AVC zero-motion standard solutions for specific types of content. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
As the wireless cellular market reaches competitive levels never seen before, network operators need to focus on maintaining Quality of Service (QoS) a main priority if they wish to attract new subscribers while keeping existing customers satisfied. Speech Quality as perceived by the end user is one major example of a characteristic in constant need of maintenance and improvement. It is in this topic that this Master Thesis project fits in. Making use of an intrusive method of speech quality evaluation, as a means to further study and characterize the performance of speech codecs in second-generation (2G) and third-generation (3G) technologies. Trying to find further correlation between codecs with similar bit rates, along with the exploration of certain transmission parameters which may aid in the assessment of speech quality. Due to some limitations concerning the audio analyzer equipment that was to be employed, a different system for recording the test samples was sought out. Although the new designed system is not standard, after extensive testing and optimization of the system's parameters, final results were found reliable and satisfactory. Tests include a set of high and low bit rate codecs for both 2G and 3G, where values were compared and analysed, leading to the outcome that 3G speech codecs perform better, under the approximately same conditions, when compared with 2G. Reinforcing the idea that 3G is, with no doubt, the best choice if the costumer looks for the best possible listening speech quality. Regarding the transmission parameters chosen for the experiment, the Receiver Quality (RxQual) and Received Energy per Chip to the Power Density Ratio (Ec/N0), these were subject to speech quality correlation tests. Final results of RxQual were compared to those of prior studies from different researchers and, are considered to be of important relevance. Leading to the confirmation of RxQual as a reliable indicator of speech quality. As for Ec/N0, it is not possible to state it as a speech quality indicator however, it shows clear thresholds for which the MOS values decrease significantly. The studied transmission parameters show that they can be used not only for network management purposes but, at the same time, give an expected idea to the communications engineer (or technician) of the end-to-end speech quality consequences. With the conclusion of the work new ideas for future studies come to mind. Considering that the fourth-generation (4G) cellular technologies are now beginning to take an important place in the global market, as the first all-IP network structure, it seems of great relevance that 4G speech quality should be subject of evaluation. Comparing it to 3G, not only in narrowband but also adding wideband scenarios with the most recent standard objective method of speech quality assessment, POLQA. Also, new data found on Ec/N0 tests, justifies further research studies with the intention of validating the assumptions made in this work.
Resumo:
Treball de recerca realitzat per un alumne d'ensenyament secundari i guardonat amb un Premi CIRIT per fomentar l'esperit científic del Jovent l'any 2009. L’objectiu d’aquest treball de recerca és la creació d’un dispositiu encarregat de centralitzar totes les necessitats multimèdia de casa nostra i distribuir aquest contingut a tots els terminals de la xarxa local d’una manera senzilla i automatitzada. Aquest dispositiu s’ha dissenyat per estar connectat a una televisió d’alta definició, que permetrà la reproducció i l’organització de tot el nostre multimèdia d’una manera còmoda i fàcil. El media center s’encarrega de gestionar la nostra filmoteca, fototeca, biblioteca musical i sèries de TV de manera transparent i automàtica. A més a més, l’usuari pot accedir a tot el multimèdia emmagatzemat al media center des de qualsevol dispositiu de la xarxa local a través de protocols com CIFS o UPnP, en un intent de replicar el cloud computing a escala local. El dispositiu ha estat dissenyat per a suportar tot tipus de formats i subtítols, assegurant la compatibilitat total amb arxius lliures de DRM. El seu disseny minimalista i silenciós el fa perfecte per a substituir el reproductor de DVD de la sala. Tot això sense oblidar el seu baix consum, de l’ordre d’un 75% inferior al d’un PC convencional.
Resumo:
Esta dissertação descreve os resultados das medições observadas em um dos Laboratórios de uma Operadora de Telecomunicações (LOP), onde foram avaliados e analisados alguns requisitos de QoS em redes de pacotes IP (Internet Protocol). Essas medições foram feitas no âmbito do objetivo desta dissertação que é avaliar formas de prover serviços VoIP (Voice over Internet Protocol) em redes de pacotes conforme a recomendação do padrão FRF.12. Essa rede é assim, uma rede de link de 512kbps que também provê serviços VoIP compartilhados, concorrentemente com dados e serviços multimídia. Dos ítens analisados destacam-se: Análise de Codecs; QoS (Quality Of Service) Diffserv; Compressão de cabeçalho RTP (Real Time Protocol) - cRTP; Fragmentação com intercalação - LFI; Comportamento da Rede em situações diversas; a adequação do software free Multi Generator (MGEN) de geração - medição - coleta de dados, em redes. A análise foi, essencialmente, em enlace Frame Relay nos CPE (Customer Premise Equipment), passando pelo Backbone IP VPN / MPLS Multicast, pois o Frame Relay Fórum v12 (FRF.12) dá suporte à intercalação de voz entre os pacotes de dados. O FRF.12 é indispensável, pois este esta dissertação tem como objetivo realizar um conjuntos de testes e medidas que avaliam a aplicação dos serviços VoIP em links de baixa capacidades com trafego de dados compartilhados. Para oferecer esse serviço e de qualidade é necessário fragmentar e intercalar frames de voz entre os pacotes de dados usando o FRF.12. Depois do estudo teórico das recomendações, normas de padronização internacional e dos fabricantes, foram realizados testes que resultam na validação prática de toda a teoria outrora analisada através de testes específicos que comprovam em definitivo a viabilidade das aplicações VoIP em uma rede de enlace de baixa velocidade. Feitos esses testes chegou-se a conclusão de que em determinados casos não se revela necessário nem preocupante o aumento da banda para se puder prover determinados serviços. Na sequência dos testes foram também avaliados o desempenho, a ocupação da banda e a eficácia dos equipamentos - softwares. Da bancada dos testes e medições, provou-se o seguinte: que de fato consegue-se melhor otimização da banda ao realizar compressão do cabeçalho cRTP; que de fato a fragmentação de pacote FTP (File Transfer Protocol) com intercalação de pacotes VoIP faz reduzir o delay e jitter1 para as aplicações de tempo real; que de fato a habilitação de QoS Intserv provê classificação e faz diferenciação dos tráfegos, e que o CODEC G729 apresenta melhor adequação em lidar com aplicações VoIP em routers2 CISCO, disponível em CRT (Centro de referência Tecnológica) de uma LOP.
Resumo:
Este trabalho trata do software livre Asterisk (Elastix), uma central telefónica privada IP ou (PBX IP) que suporta inúmeros protocolos e codecs da tecnologia de voz sobre IP, Abordando a sua instalação, configuração e compatibilidade com hardware de telefonia. O crescimento das redes IP, as técnicas avançadas de digitalização de voz e os mecanismos que permitem a qualidade dos serviços, permitiram a consolidação da telefonia IP. A telefonia IP está em todo-poderoso crescimento, pois além de reduzir os custos das ligações telefónicas, ela permite a ligação entre as redes de dados e de voz, criando uma infra-estrutura única, facilitando a instalação, a manutenção e o gerenciamento. Este trabalho tem como objetivo fazer um estudo da Tecnologia VoIP (Voz sob IP), propor uma estrutura e implementar um ambiente de teste, uma central telefónica VoIP com o uso do Asterisk. A central telefónica VoIP baseada no software livre Asterisk possibilita a interligação de localidades geograficamente distantes uma das outras através das redes IP, sem a necessidade de pagar os altos valores cobrados pelos fabricantes de centrais telefónicas, de hardware proprietário, pela manutenção, fornecimento de equipamentos e licenças, mas permite que se obtenha os mesmos resultados, como por exemplo, que todos os Funcionários e ou colaboradores de uma empresa realizem chamadas telefónicas entre si sem precisar pagar altas taxas cobradas pelas operadoras de telefonia pública. Através da pesquisa bibliográfica sobre a tecnologia VoIP e do estudo do software livre Asterisk, será proposta a implantação da tecnologia através da implementação de um central VoIP. A verificação dessa implantação será realizada através de testes práticos em um ambiente que será desenvolvido.
Resumo:
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETS1) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. (C) 2009 Elsevier Ltd. All rights reserved.