903 resultados para Audio-visual content classification
Resumo:
This paper presents an integrated system for vehicle classification. This system aims to classify vehicles using different approaches: 1) based on the height of the first axle and_the number of axles; 2) based on volumetric measurements and; 3) based on features extracted from the captured image of the vehicle. The system uses a laser sensor for measurements and a set of image analysis algorithms to compute some visual features. By combining different classification methods, it is shown that the system improves its accuracy and robustness, enabling its usage in more difficult environments satisfying the proposed requirements established by the Portuguese motorway contractor BRISA.
Resumo:
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Resumo:
OBJETIVO: Investigar o desenvolvimento da linguagem e das funções auditiva e visual em lactentes de creche, a partir da avaliação realizada por educadores. MÉTODOS: Foram avaliados 115 lactentes, nos anos de 1998 a 2001, usuários de uma creche da área da saúde de uma universidade do Estado de São Paulo. Foi utilizado o "Protocolo da Observação do Desenvolvimento de Linguagem e das Funções Auditiva e Visual", com 39 provas no total, para a avaliação dos lactentes de 3 até 12 meses de idade. A aplicação desse Protocolo foi feita pelas educadoras da creche, devidamente treinadas. Utilizou-se o teste de Qui-quadrado ou Exato de Fisher. O nível de significância adotado foi de 5%. RESULTADOS: Os lactentes apresentaram um padrão diferente no desenvolvimento da linguagem quanto ao início do balbucio e das primeiras palavras, bem como na função visual, quanto à imitação e uso de jogos gestuais e de seguir ordem com uso de gestos. CONCLUSÕES: O ambiente creche propicia condições para um outro padrão de desenvolvimento de linguagem e das funções auditiva e visual. Ações de prevenção na creche devem integrar as áreas de saúde e educação num objetivo comum.
Resumo:
Video coding technologies have played a major role in the explosion of large market digital video applications and services. In this context, the very popular MPEG-x and H-26x video coding standards adopted a predictive coding paradigm, where complex encoders exploit the data redundancy and irrelevancy to 'control' much simpler decoders. This codec paradigm fits well applications and services such as digital television and video storage where the decoder complexity is critical, but does not match well the requirements of emerging applications such as visual sensor networks where the encoder complexity is more critical. The Slepian Wolf and Wyner-Ziv theorems brought the possibility to develop the so-called Wyner-Ziv video codecs, following a different coding paradigm where it is the task of the decoder, and not anymore of the encoder, to (fully or partly) exploit the video redundancy. Theoretically, Wyner-Ziv video coding does not incur in any compression performance penalty regarding the more traditional predictive coding paradigm (at least for certain conditions). In the context of Wyner-Ziv video codecs, the so-called side information, which is a decoder estimate of the original frame to code, plays a critical role in the overall compression performance. For this reason, much research effort has been invested in the past decade to develop increasingly more efficient side information creation methods. This paper has the main objective to review and evaluate the available side information methods after proposing a classification taxonomy to guide this review, allowing to achieve more solid conclusions and better identify the next relevant research challenges. After classifying the side information creation methods into four classes, notably guess, try, hint and learn, the review of the most important techniques in each class and the evaluation of some of them leads to the important conclusion that the side information creation methods provide better rate-distortion (RD) performance depending on the amount of temporal correlation in each video sequence. It became also clear that the best available Wyner-Ziv video coding solutions are almost systematically based on the learn approach. The best solutions are already able to systematically outperform the H.264/AVC Intra, and also the H.264/AVC zero-motion standard solutions for specific types of content. (C) 2013 Elsevier B.V. All rights reserved.
Expert opinion on best practice guidelines and competency framework for visual screening in children
Resumo:
PURPOSE: Screening programs to detect visual abnormalities in children vary among countries. The aim of this study is to describe experts' perception of best practice guidelines and competency framework for visual screening in children. METHODS: A qualitative focus group technique was applied during the Portuguese national orthoptic congress to obtain the perception of an expert panel of 5 orthoptists and 2 ophthalmologists with experience in visual screening for children (mean age 53.43 years, SD ± 9.40). The panel received in advance a script with the description of three tuning competencies dimensions (instrumental, systemic, and interpersonal) for visual screening. The session was recorded in video and audio. Qualitative data were analyzed using a categorical technique. RESULTS: According to experts' views, six tests (35.29%) have to be included in a visual screening: distance visual acuity test, cover test, bi-prism or 4/6(Δ) prism, fusion, ocular movements, and refraction. Screening should be performed according to the child age before and after 3 years of age (17.65%). The expert panel highlighted the influence of the professional experience in the application of a screening protocol (23.53%). They also showed concern about the false negatives control (23.53%). Instrumental competencies were the most cited (54.09%), followed by interpersonal (29.51%) and systemic (16.4%). CONCLUSIONS: Orthoptists should have professional experience before starting to apply a screening protocol. False negative results are a concern that has to be more thoroughly investigated. The proposed framework focuses on core competencies highlighted by the expert panel. Competencies programs could be important do develop better screening programs.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
Steatosis, also known as fatty liver, corresponds to an abnormal retention of lipids within the hepatic cells and reflects an impairment of the normal processes of synthesis and elimination of fat. Several causes may lead to this condition, namely obesity, diabetes, or alcoholism. In this paper an automatic classification algorithm is proposed for the diagnosis of the liver steatosis from ultrasound images. The features are selected in order to catch the same characteristics used by the physicians in the diagnosis of the disease based on visual inspection of the ultrasound images. The algorithm, designed in a Bayesian framework, computes two images: i) a despeckled one, containing the anatomic and echogenic information of the liver, and ii) an image containing only the speckle used to compute the textural features. These images are computed from the estimated RF signal generated by the ultrasound probe where the dynamic range compression performed by the equipment is taken into account. A Bayes classifier, trained with data manually classified by expert clinicians and used as ground truth, reaches an overall accuracy of 95% and a 100% of sensitivity. The main novelties of the method are the estimations of the RF and speckle images which make it possible to accurately compute textural features of the liver parenchyma relevant for the diagnosis.
Resumo:
Mestrado em Radiações Aplicadas às Tecnologias da Saúde - Ramo de especialização: Imagem por Ressonância Magnética
Resumo:
BACKGROUND: Examining changes in brain activation linked with emotion-inducing stimuli is essential to the study of emotions. Due to the ecological potential of techniques such as virtual reality (VR), inspection of whether brain activation in response to emotional stimuli can be modulated by the three-dimensional (3D) properties of the images is important. OBJECTIVE: The current study sought to test whether the activation of brain areas involved in the emotional processing of scenarios of different valences can be modulated by 3D. Therefore, the focus was made on the interaction effect between emotion-inducing stimuli of different emotional valences (pleasant, unpleasant and neutral valences) and visualization types (2D, 3D). However, main effects were also analyzed.METHODS: The effect of emotional valence and visualization types and their interaction were analyzed through a 3x2 repeated measures ANOVA. Post-hoc t-tests were performed under a ROI-analysis approach. RESULTS: The results show increased brain activation for the 3D affective-inducing stimuli in comparison with the same stimuli in 2D scenarios, mostly in cortical and subcortical regions that are related to emotional processing, in addition to visual processing regions. CONCLUSIONS: This study has the potential of clarify brain mechanisms involved in the processing of emotional stimuli (scenarios’ valence) and their interaction with three-dimensionality.
Resumo:
Dissertation for a Masters Degree in Computer and Electronic Engineering
Resumo:
Background - Medical image perception research relies on visual data to study the diagnostic relationship between observers and medical images. A consistent method to assess visual function for participants in medical imaging research has not been developed and represents a significant gap in existing research. Methods - Three visual assessment factors appropriate to observer studies were identified: visual acuity, contrast sensitivity, and stereopsis. A test was designed for each, and 30 radiography observers (mean age 31.6 years) participated in each test. Results - Mean binocular visual acuity for distance was 20/14 for all observers. The difference between observers who did and did not use corrective lenses was not statistically significant (P = .12). All subjects had a normal value for near visual acuity and stereoacuity. Contrast sensitivity was better than population norms. Conclusion - All observers had normal visual function and could participate in medical imaging visual analysis studies. Protocols of evaluation and populations norms are provided. Further studies are necessary to understand fully the relationship between visual performance on tests and diagnostic accuracy in practice.
Resumo:
O processo de envelhecimento fisiológico é marcado por um decréscimo das capacidades motoras, redução da força, flexibilidade, função visual, entre outros. Adicionalmente, alterações patológicas do sistema visual com impacto na função visual podem alterar o equilíbrio e aumentar o risco de quedas. Os indivíduos com idade ≥ 50 anos representam entre 65% a 82% dos casos de baixa visão e cegueira. O risco de quedas é superior em mulheres com acuidade visual ≤0,5. Objectivo do estudo: avaliar a relação entre a função visual e o risco de quedas.
Resumo:
Vishnu is a tool for XSLT visual programming in Eclipse - a popular and extensible integrated development environment. Rather than writing the XSLT transformations, the programmer loads or edits two document instances, a source document and its corresponding target document, and pairs texts between then by drawing lines over the documents. This form of XSLT programming is intended for simple transformations between related document types, such as HTML formatting or conversion among similar formats. Complex XSLT programs involving, for instance, recursive templates or second order transformations are out of the scope of Vishnu. We present the architecture of Vishnu composed by a graphical editor and a programming engine. The editor is an Eclipse plug-in where the programmer loads and edits document examples and pairs their content using graphical primitives. The programming engine receives the data collected by the editor and produces an XSLT program. The design of the engine and the process of creation of an XSLT program from examples are also detailed. It starts with the generation of an initial transformation that maps source document to the target document. This transformation is fed to a rewrite process where each step produces a refined version of the transformation. Finally, the transformation is simplified before being presented to the programmer for further editing.
Resumo:
In the last decade, local image features have been widely used in robot visual localization. In order to assess image similarity, a strategy exploiting these features compares raw descriptors extracted from the current image with those in the models of places. This paper addresses the ensuing step in this process, where a combining function must be used to aggregate results and assign each place a score. Casting the problem in the multiple classifier systems framework, in this paper we compare several candidate combiners with respect to their performance in the visual localization task. For this evaluation, we selected the most popular methods in the class of non-trained combiners, namely the sum rule and product rule. A deeper insight into the potential of these combiners is provided through a discriminativity analysis involving the algebraic rules and two extensions of these methods: the threshold, as well as the weighted modifications. In addition, a voting method, previously used in robot visual localization, is assessed. Furthermore, we address the process of constructing a model of the environment by describing how the model granularity impacts upon performance. All combiners are tested on a visual localization task, carried out on a public dataset. It is experimentally demonstrated that the sum rule extensions globally achieve the best performance, confirming the general agreement on the robustness of this rule in other classification problems. The voting method, whilst competitive with the product rule in its standard form, is shown to be outperformed by its modified versions.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Civil na Área de Especialização de Edificações