950 resultados para 3D Video Telecommunication Multimedia
Resumo:
Our research investigates the impact that hearing has on the perception of digital video clips, with and without captions, by discussing how hearing loss, captions and deafness type affects user QoP (Quality of Perception). QoP encompasses not only a user's satisfaction with the quality of a multimedia presentation, but also their ability to analyse, synthesise and assimilate informational content of multimedia . Results show that hearing has a significant effect on participants’ ability to assimilate information, independent of video type and use of captions. It is shown that captions do not necessarily provide deaf users with a ‘greater level of information’ from video, but cause a change in user QoP, depending on deafness type, which provides a ‘greater level of context of the video’. It is also shown that post-lingual mild and moderately deaf participants predict less accurately their level of information assimilation than post-lingual profoundly deaf participants, despite residual hearing. A positive correlation was identified between level of enjoyment (LOE) and self-predicted level of information assimilation (PIA), independent of hearing level or hearing type. When this is considered in a QoP quality framework, it puts into question how the user perceives certain factors, such as ‘informative’ and ‘quality’.
Resumo:
We investigate the impact of captions on deaf and hearing perception of multimedia video clips. We measure perception using a parameter called Quality of Perception (QoP), which encompasses not only a user's satisfaction with multimedia clips, but also his/her ability to perceive, synthesise and analyse the informational content of such presentations. By studying perceptual diversity, it is our aim to identify trends that will help future implementation of adaptive multimedia technologies. Results show that although hearing level has a significant affect on information assimilation, the effect of captions is not significant on the objective level of information assimilated. Deaf participants predict that captions significantly improve their level of information assimilation, although no significant objective improvement was measured. The level of enjoyment is unaffected by a participant’s level of hearing or use of captions.
Resumo:
Vídeos são dos principais meios de difusão de conhecimento, informação e entretenimento existentes. Todavia, apesar da boa qualidade e da boa aceitação do público, os vídeos atuais ainda restringem o espectador a um único ponto de vista. Atualmente, alguns estudos estão sendo desenvolvidos visando oferecer ao espectador maior liberdade para decidir de onde ele gostaria de assistir a cena. O tipo de vídeo a ser produzido por essas iniciativas tem sido chamado genericamente de vídeo 3D. Esse trabalho propõe uma arquitetura para captura e exibição de vídeos 3D em tempo real utilizando as informações de cor e profundidade da cena, capturadas para cada pixel de cada quadro do vídeo. A informação de profundidade pode ser obtida utilizando-se câmeras 3D, algoritmos de extração de disparidade a partir de estéreo, ou com auxílio de luz estruturada. A partir da informação de profundidade é possível calcular novos pontos de vista da cena utilizando um algoritmo de warping 3D. Devido a não disponibilidade de câmeras 3D durante a realização deste trabalho, a arquitetura proposta foi validada utilizando um ambiente sintético construído usando técnicas de computação gráfica. Este protótipo também foi utilizado para analisar diversos algoritmos de visão computacional que utilizam imagens estereoscópias para a extração da profundidade de cenas em tempo real. O uso de um ambiente controlado permitiu uma análise bastante criteriosa da qualidade dos mapas de profundidade produzidos por estes algoritmos, nos levando a concluir que eles ainda não são apropriados para uso de aplicações que necessitem da captura de vídeo 3D em tempo real.
Resumo:
[EN]Parliamentary websites have become one of the most important windows for citizens and media to follow the activities of their legislatures and to hold parliaments to account. Therefore, most parliamentary institutions aim to provide new multimedia solutions capable of displaying video fragments on demand on plenary activities. This paper presents a multimedia system for parliamentary institutions to produce video fragments on demand through a website with linked information and public feedback that helps to explain the content shown in these fragments. A prototype implementation has been developed for the Canary Islands Parliament (Spain) and shows how traditional parliamentary streaming systems can be enhanced by the use of semantics and computer vision for video analytics...
Resumo:
For smart applications, nodes in wireless multimedia sensor networks (MWSNs) have to take decisions based on sensed scalar physical measurements. A routing protocol must provide the multimedia delivery with quality level support and be energy-efficient for large-scale networks. With this goal in mind, this paper proposes a smart Multi-hop hierarchical routing protocol for Efficient VIdeo communication (MEVI). MEVI combines an opportunistic scheme to create clusters, a cross-layer solution to select routes based on network conditions, and a smart solution to trigger multimedia transmission according to sensed data. Simulations were conducted to show the benefits of MEVI compared with the well-known Low-Energy Adaptive Clustering Hierarchy (LEACH) protocol. This paper includes an analysis of the signaling overhead, energy-efficiency, and video quality.
Resumo:
Wireless Multimedia Sensor Networks (WMSNs) promise a wide scope of emerging potential applications in both civilian and military areas, which require visual and audio information to enhance the level of collected information. The transmission of multimedia content requires a minimal video quality level from the user’s perspective. However, links in WMSN communi- cations are typically unreliable, as they often experience fluctuations in quality and weak connectivity, and thus, the routing protocol must evaluate the routes by using end-to-end link quality information to increase the packet delivery ratio. Moreover, the use multiple paths together with key video metrics can enhance the video quality level. In this paper, we propose a video-aware multiple path hierarchical routing protocol for efficient multimedia transmission over WMSN, called video-aware MMtransmission. This protocol finds node-disjoint multiple paths, and implements an end-to-end link quality estimation with minimal over- head to score the paths. Thus, our protocol assures multimedia transmission with Quality of Experience (QoE) and energy-efficiency support. The simula- tion results show the benefits of video-aware MMtransmission for disseminating video content by means of energy-efficiency and QoE analysis.
Resumo:
The development and evaluation of new algorithms and protocols for Wireless Multimedia Sensor Networks (WMSNs) are usually supported by means of a discrete event network simulator, where OMNeT++ is one of the most important ones. However, experiments involving multimedia transmission, video flows with different characteristics, genres, group of pictures lengths, and coding techniques must be evaluated based also on Quality of Experience (QoE) metrics to reflect the user's perception. Such experiments require the evaluation of video-related information, i.e., frame type, received/lost, delay, jitter, decoding errors, as well as inter and intra-frame dependency of received/distorted videos. However, existing OMNeT++ frameworks for WMSNs do not support video transmissions with QoE-awareness, neither a large set of mobility traces to enable evaluations under different multimedia/mobile situations. In this paper, we propose a Mobile MultiMedia Wireless Sensor Network OMNeT++ framework (M3WSN) to support transmission, control and evaluation of real video sequences in mobile WMSNs.
Resumo:
Wireless Mesh Networks (WMNs) are increasingly deployed to enable thousands of users to share, create, and access live video streaming with different characteristics and content, such as video surveillance and football matches. In this context, there is a need for new mechanisms for assessing the quality level of videos because operators are seeking to control their delivery process and optimize their network resources, while increasing the user’s satisfaction. However, the development of in-service and non-intrusive Quality of Experience assessment schemes for real-time Internet videos with different complexity and motion levels, Group of Picture lengths, and characteristics, remains a significant challenge. To address this issue, this article proposes a non-intrusive parametric real-time video quality estimator, called MultiQoE that correlates wireless networks’ impairments, videos’ characteristics, and users’ perception into a predicted Mean Opinion Score. An instance of MultiQoE was implemented in WMNs and performance evaluation results demonstrate the efficiency and accuracy of MultiQoE in predicting the user’s perception of live video streaming services when compared to subjective, objective, and well-known parametric solutions.
Resumo:
User experience on watching live videos must be satisfactory even under the inuence of different network conditions and topology changes, such as happening in Flying Ad-Hoc Networks (FANETs). Routing services for video dissemination over FANETs must be able to adapt routing decisions at runtime to meet Quality of Experience (QoE) requirements. In this paper, we introduce an adaptive beaconless opportunistic routing protocol for video dissemination over FANETs with QoE support, by taking into account multiple types of context information, such as link quality, residual energy, buffer state, as well as geographic information and node mobility in a 3D space. The proposed protocol takes into account Bayesian networks to define weight vectors and Analytic Hierarchy Process (AHP) to adjust the degree of importance for the context information based on instantaneous values. It also includes a position prediction to monitor the distance between two nodes in order to detect possible route failure.
Resumo:
The emerging use of real-time 3D-based multimedia applications imposes strict quality of service (QoS) requirements on both access and core networks. These requirements and their impact to provide end-to-end 3D videoconferencing services have been studied within the Spanish-funded VISION project, where different scenarios were implemented showing an agile stereoscopic video call that might be offered to the general public in the near future. In view of the requirements, we designed an integrated access and core converged network architecture which provides the requested QoS to end-to-end IP sessions. Novel functional blocks are proposed to control core optical networks, the functionality of the standard ones is redefined, and the signaling improved to better meet the requirements of future multimedia services. An experimental test-bed to assess the feasibility of the solution was also deployed. In such test-bed, set-up and release of end-to-end sessions meeting specific QoS requirements are shown and the impact of QoS degradation in terms of the user perceived quality degradation is quantified. In addition, scalability results show that the proposed signaling architecture is able to cope with large number of requests introducing almost negligible delay.
Resumo:
Shading reduces the power output of a photovoltaic (PV) system. The design engineering of PV systems requires modeling and evaluating shading losses. Some PV systems are affected by complex shading scenes whose resulting PV energy losses are very difficult to evaluate with current modeling tools. Several specialized PV design and simulation software include the possibility to evaluate shading losses. They generally possess a Graphical User Interface (GUI) through which the user can draw a 3D shading scene, and then evaluate its corresponding PV energy losses. The complexity of the objects that these tools can handle is relatively limited. We have created a software solution, 3DPV, which allows evaluating the energy losses induced by complex 3D scenes on PV generators. The 3D objects can be imported from specialized 3D modeling software or from a 3D object library. The shadows cast by this 3D scene on the PV generator are then directly evaluated from the Graphics Processing Unit (GPU). Thanks to the recent development of GPUs for the video game industry, the shadows can be evaluated with a very high spatial resolution that reaches well beyond the PV cell level, in very short calculation times. A PV simulation model then translates the geometrical shading into PV energy output losses. 3DPV has been implemented using WebGL, which allows it to run directly from a Web browser, without requiring any local installation from the user. This also allows taken full benefits from the information already available from Internet, such as the 3D object libraries. This contribution describes, step by step, the method that allows 3DPV to evaluate the PV energy losses caused by complex shading. We then illustrate the results of this methodology to several application cases that are encountered in the world of PV systems design. Keywords: 3D, modeling, simulation, GPU, shading, losses, shadow mapping, solar, photovoltaic, PV, WebGL
Resumo:
Estudios recientes promueven la integración de estímulos multisensoriales en activos multimedia con el fin de mejorar la experiencia de usuario mediante la estimulación de nuevos sentidos, más allá de la tradicional experiencia audiovisual. Del mismo modo, varios trabajos proponen la introducción de componentes de interacción capaces de complementar con nuevas características, funcionalidades y/o información la experiencia multimedia. Efectos sensoriales basados en el uso de nuevas técnicas de audio, olores, viento, vibraciones y control de la iluminación, han demostrado tener un impacto favorable en la sensación de Presencia, en el disfrute de la experiencia multimedia y en la calidad, relevancia y realismo de la misma percibidos por el usuario. Asimismo, los servicios basados en dos pantallas y la manipulación directa de (elementos en) la escena de video tienen el potencial de mejorar la comprensión, la concentración y la implicación proactiva del usuario en la experiencia multimedia. El deporte se encuentra entre los géneros con mayor potencial para integrar y explotar éstas soluciones tecnológicas. Trabajos previos han demostrado asimismo la viabilidad técnica de integrar éstas tecnologías con los estándares actualmente adoptados a lo largo de toda la cadena de transmisión de televisión. De este modo, los sistemas multimedia enriquecidos con efectos sensoriales, los servicios interactivos multiplataforma y un mayor control del usuario sobre la escena de vídeo emergen como nuevas formas de llevar la multimedia immersiva e interactiva al mercado de consumo de forma no disruptiva. Sin embargo, existen numerosas interrogantes relativas a los efectos sensoriales y/o soluciones interactivas más adecuadas para complementar un contenido audiovisual determinado o a la mejor manera de de integrar y combinar dichos componentes para mejorar la experiencia de usuario de un segmento de audiencia objetivo. Además, la evidencia científica sobre el impacto de factores humanos en la experiencia de usuario con estas nuevas formas de immersión e interacción en el contexto multimedia es aún insuficiente y en ocasiones, contradictoria. Así, el papel de éstos factores en el potencial de adopción de éstas tecnologías ha sido amplia-mente ignorado. La presente tesis analiza el impacto del audio binaural, efectos sensoriales (de iluminación y olfativos), interacción con objetos 3D integrados en la escena de vídeo e interacción con contenido adicional utilizando una segunda pantalla en la experiencia de usuario con contenidos de deporte. La posible influencia de dichos componentes en las variables dependientes se explora tanto a nivel global (efecto promedio) como en función de las características de los usuarios (efectos heterogéneos). Para ello, se ha llevado a cabo un experimento con usuarios orientado a explorar la influencia de éstos componentes immersivos e interactivos en dos grandes dimensiones de la experiencia multimedia: calidad y Presencia. La calidad de la experiencia multimedia se analiza en términos de las posibles variaciones asociadas a la calidad global y a la calidad del contenido, la imagen, el audio, los efectos sensoriales, la interacción con objetos 3D y la interacción con la segunda pantalla. El posible impacto en la Presencia considera dos de las dimensiones definidas por el cuestionario ITC-SOPI: Presencia Espacial (Spatial Presence) e Implicación (Engagement). Por último, los individuos son caracterizados teniendo en cuenta los siguientes atributos afectivos, cognitivos y conductuales: preferencias y hábitos en relación con el contenido, grado de conocimiento de las tecnologías integradas en el sistema, tendencia a involucrarse emocionalmente, tendencia a concentrarse en una actividad bloqueando estímulos externos y los cinco grandes rasgos de la personalidad: extroversión, amabilidad, responsabilidad, inestabilidad emocional y apertura a nuevas experiencias. A nivel global, nuestro estudio revela que los participantes prefieren el audio binaural frente al sistema estéreo y que los efectos sensoriales generan un aumento significativo del nivel de Presencia Espacial percibido por los usuarios. Además, las manipulaciones experimentales realizadas permitieron identificar una gran variedad de efectos heterogéneos. Un resultado interesante es que dichos efectos no se encuentran distribuidos de forma equitativa entre las medidas de calidad y Presencia. Nuestros datos revelan un impacto generalizado del audio binaural en la mayoría de las medidas de calidad y Presencia analizadas. En cambio, la influencia de los efectos sensoriales y de la interacción con la segunda pantalla se concentran en las medidas de Presencia y calidad, respectivamente. La magnitud de los efectos heterogéneos identificados está modulada por las siguientes características personales: preferencias en relación con el contenido, frecuencia con la que el usuario suele ver contenido similar, conocimiento de las tecnologías integradas en el demostrador, sexo, tendencia a involucrarse emocionalmente, tendencia a a concentrarse en una actividad bloqueando estímulos externos y niveles de amabilidad, responsabilidad y apertura a nuevas experiencias. Las características personales consideradas en nuestro experimento explicaron la mayor parte de la variación en las variables dependientes, confirmando así el importante (y frecuentemente ignorado) papel de las diferencias individuales en la experiencia multimedia. Entre las características de los usuarios con un impacto más generalizado se encuentran las preferencias en relación con el contenido, el grado de conocimiento de las tecnologías integradas en el sistema y la tendencia a involucrarse emocionalmente. En particular, los primeros dos factores parecen generar un conflicto de atención hacia el contenido versus las características/elementos técnicos del sistema, respectivamente. Asimismo, la experiencia multimedia de los fans del fútbol parece estar modulada por procesos emociona-les, mientras que para los no-fans predominan los procesos cognitivos, en particular aquellos directamente relacionados con la percepción de calidad. Abstract Recent studies encourage the integration of multi-sensorial stimuli into multimedia assets to enhance the user experience by stimulating other senses beyond sight and hearing. Similarly, the introduction of multi-modal interaction components complementing with new features, functionalities and/or information the multimedia experience is promoted. Sensory effects as odor, wind, vibration and light effects, as well as an enhanced audio quality, have been found to favour media enjoyment and to have a positive influence on the sense of Presence and on the perceived quality, relevance and reality of a multimedia experience. Two-screen services and a direct manipulation of (elements in) the video scene have the potential to enhance user comprehension, engagement and proactive involvement of/in the media experience. Sports is among the genres that could benefit the most from these solutions. Previous works have demonstrated the technical feasibility of implementing and deploying end-to-end solutions integrating these technologies into legacy systems. Thus, sensorially-enhanced media, two-screen services and an increased user control over the displayed scene emerge as means to deliver a new form of immersive and interactive media experiences to the mass market in a non-disruptive manner. However, many questions remain concerning issues as the specific interactive solutions or sensory effects that can better complement a given audiovisual content or the best way in which to integrate and combine them to enhance the user experience of a target audience segment. Furthermore, scientific evidence on the impact of human factors on the user experience with these new forms of immersive and interactive media is still insufficient and sometimes, contradictory. Thus, the role of these factors on the potential adoption of these technologies has been widely ignored. This thesis analyzes the impact of binaural audio, sensory (light and olfactory) effects, interaction with 3D objects integrated into the video scene and interaction with additional content using a second screen on the sports media experience. The potential influence of these components on the dependent variables is explored both at the overall level (average effect) and as a function of users’ characteristics (heterogeneous effects). To these aims, we conducted an experimental study exploring the influence of these immersive and interactive elements on the quality and Presence dimensions of the media experience. Along the quality dimension, we look for possible variations on the quality scores as-signed to the overall media experience and to the media components content, image, audio, sensory effects, interaction with 3D objects and interaction using the tablet device. The potential impact on Presence is analyzed by looking at two of the four dimensions defined by the ITC-SOPI questionnaire, namely Spatial Presence and Engagement. The users’ characteristics considered encompass the following personal affective, cognitive and behavioral attributes: preferences and habits in relation to the content, knowledge of the involved technologies, tendency to get emotionally involved and tendency to get absorbed in an activity and block out external distractors and the big five personality traits extraversion, agreeableness, conscientiousness, neuroticism and openness to experience. At the overall level, we found that participants preferred binaural audio than standard stereo audio and that sensory effects increase significantly the level of Spatial Presence. Several heterogeneous effects were also revealed as a result of our experimental manipulations. Interestingly, these effects were not equally distributed across the quality and Presence measures analyzed. Whereas binaural audio was foud to have an influence on the majority of the quality and Presence measures considered, the effects of sensory effects and of interaction with additional content through the tablet device concentrate mainly on the dimensions of Presence and on quality measures, respectively. The magnitude of these effects was modulated by individual’s characteristics, such as: preferences in relation to the content, frequency of viewing similar content, knowledge of involved technologies, gender, tendency to get emotionally involved, tendency to absorption and levels of agreeableness, conscientiousness and openness to experience. The personal characteristics collected in our experiment explained most of the variation in the dependent variables, confirming the frequently neglected role of individual differences on the media experience. Preferences in relation to the content, knowledge of involved technologies and tendency to get emotionally involved were among the user variables with the most generalized influence. In particular, the former two features seem to present a conflict in the allocation of attentional resources towards the media content versus the technical features of the system, respectively. Additionally, football fans’ experience seems to be modulated by emotional processes whereas for not fans, cognitive processes (and in particular those related to quality judgment) prevail.
Resumo:
La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.
Resumo:
With the rapid development of Internet technologies, video and audio processing are among the most important parts due to the constant requirements of high quality media contents. Along with the improvement of network environment and the hardware equipment, this demand is becoming more and more imperious, people prefer high quality videos and audios as well as the net streaming media resources. FFmpeg is a set of open source program about the A/V decoding. Many commercial players use FFmpeg as their displaying cores. This paper designed a simple and easy-to-use video player based on FFmpeg. The first part is about the basic theories and related knowledge of video displaying, including some concepts like data formats, streaming media data, video coding and decoding. In a word, the realization of the video player depend on the a set of video decoding process. The general idea about the process is to get the video packets from the Internet, to read the related protocols and de-encapsulate the protocols, to de-encapsulate the packaging data and to get encoded formats data, to decode them to pixel data that can be displayed directly through graphics cards. During the coding and decoding process, there could be different degrees of data losing, which is called lossy compression, but it usually does not influence the quality of user experiences. The second part is about the principle of the FFmpeg decoding process, that is one of the key point of the paper. In this project, FFmpeg is used for the main decoding task, by call some main functions and structures from FFmpeg class libraries, packaging video formats could be transfer to pixel data, after getting the pixel data, SDL is used for the displaying process. The third part is about the SDL displaying flow. Similarly, it would invoke some important displaying functions from SDL class libraries to realize the function, though SDL is able to do not only displaying task, but also many other game playing process. After that, a independent video displayer is completed, it is provided with all the key function of a player. The fourth part make a simple users interface for the player based on the MFC program, it enable the player could be used by most people. At last, in consideration of the mobile Internet’s blossom, people nowadays can hardly ever drop their mobile phones, there is a brief introduction about how to transplant the video player to Android platform which is one of the most used mobile systems.