10 resultados para Audio-visual content classification
em Universidad de Alicante
Resumo:
This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.
Resumo:
This article describes the Robot Vision challenge, a competition that evaluates solutions for the visual place classification problem. Since its origin, this challenge has been proposed as a common benchmark where worldwide proposals are measured using a common overall score. Each new edition of the competition introduced novelties, both for the type of input data and subobjectives of the challenge. All the techniques used by the participants have been gathered up and published to make it accessible for future developments. The legacy of the Robot Vision challenge includes data sets, benchmarking techniques, and a wide experience in the place classification research that is reflected in this article.
Resumo:
Las teorías cognitivas han demostrado que el pensamiento humano se encuentra corporeizado; es decir, que accedemos a la realidad mediante nuestros sentidos y no podemos huir de ellos. Para entender y manejar conceptos abstractos utilizamos proyecciones metafóricas basadas en sensaciones corporales. De ahí la ubicuidad de la metáfora en el lenguaje cotidiano. Aunque esta afirmación ha sido ampliamente probada con el análisis del corpus verbal en distintas lenguas, apenas existen investigaciones en el corpus audiovisual. Si las metáforas primarias forman parte de nuestro inconsciente cognitivo, son inherentes al ser humano y consecuencia de la naturaleza del cerebro, deben generar también metáforas visuales. En este artículo, se analizan y discuten una serie de ejemplos para comprobarlo.
Resumo:
From a gender perspective, protection and advertising political actions about work-family should promote sharing responsibilities between sexes. Next to political action and specific measures, the project of equal opportunities needs a long-term strategy based on the education on equality. This article proposes the methodologic exposition of a study based on these premises. It facilitates and explains the protocol used for the analysis of the audio-visual advertising campaigns on conciliation emitted by the Woman’s Institute. The evaluation of the actions is focused on the effectiveness from the point of view of mass media. It provides some data that illustrates the proposed study. Finally, it considers the difficulties of the available sources of information.
Resumo:
In this paper, we present a novel coarse-to-fine visual localization approach: contextual visual localization. This approach relies on three elements: (i) a minimal-complexity classifier for performing fast coarse localization (submap classification); (ii) an optimized saliency detector which exploits the visual statistics of the submap; and (iii) a fast view-matching algorithm which filters initial matchings with a structural criterion. The latter algorithm yields fine localization. Our experiments show that these elements have been successfully integrated for solving the global localization problem. Context, that is, the awareness of being in a particular submap, is defined by a supervised classifier tuned for a minimal set of features. Visual context is exploited both for tuning (optimizing) the saliency detection process, and to select potential matching views in the visual database, close enough to the query view.
Resumo:
Este artículo presenta un nuevo algoritmo de fusión de clasificadores a partir de su matriz de confusión de la que se extraen los valores de precisión (precision) y cobertura (recall) de cada uno de ellos. Los únicos datos requeridos para poder aplicar este nuevo método de fusión son las clases o etiquetas asignadas por cada uno de los sistemas y las clases de referencia en la parte de desarrollo de la base de datos. Se describe el algoritmo propuesto y se recogen los resultados obtenidos en la combinación de las salidas de dos sistemas participantes en la campaña de evaluación de segmentación de audio Albayzin 2012. Se ha comprobado la robustez del algoritmo, obteniendo una reducción relativa del error de segmentación del 6.28% utilizando para realizar la fusión el sistema con menor y mayor tasa de error de los presentados a la evaluación.
Resumo:
In this paper, a multimodal and interactive prototype to perform music genre classification is presented. The system is oriented to multi-part files in symbolic format but it can be adapted using a transcription system to transform audio content in music scores. This prototype uses different sources of information to give a possible answer to the user. It has been developed to allow a human expert to interact with the system to improve its results. In its current implementation, it offers a limited range of interaction and multimodality. Further development aimed at full interactivity and multimodal interactions is discussed.
Resumo:
Background: The harmonization of European health systems brings with it a need for tools to allow the standardized collection of information about medical care. A common coding system and standards for the description of services are needed to allow local data to be incorporated into evidence-informed policy, and to permit equity and mobility to be assessed. The aim of this project has been to design such a classification and a related tool for the coding of services for Long Term Care (DESDE-LTC), based on the European Service Mapping Schedule (ESMS). Methods: The development of DESDE-LTC followed an iterative process using nominal groups in 6 European countries. 54 researchers and stakeholders in health and social services contributed to this process. In order to classify services, we use the minimal organization unit or “Basic Stable Input of Care” (BSIC), coded by its principal function or “Main Type of Care” (MTC). The evaluation of the tool included an analysis of feasibility, consistency, ontology, inter-rater reliability, Boolean Factor Analysis, and a preliminary impact analysis (screening, scoping and appraisal). Results: DESDE-LTC includes an alpha-numerical coding system, a glossary and an assessment instrument for mapping and counting LTC. It shows high feasibility, consistency, inter-rater reliability and face, content and construct validity. DESDE-LTC is ontologically consistent. It is regarded by experts as useful and relevant for evidence-informed decision making. Conclusion: DESDE-LTC contributes to establishing a common terminology, taxonomy and coding of LTC services in a European context, and a standard procedure for data collection and international comparison.
Resumo:
Objectives: To design and validate a questionnaire to measure visual symptoms related to exposure to computers in the workplace. Study Design and Setting: Our computer vision syndrome questionnaire (CVS-Q) was based on a literature review and validated through discussion with experts and performance of a pretest, pilot test, and retest. Content validity was evaluated by occupational health, optometry, and ophthalmology experts. Rasch analysis was used in the psychometric evaluation of the questionnaire. Criterion validity was determined by calculating the sensitivity and specificity, receiver operator characteristic curve, and cutoff point. Testeretest repeatability was tested using the intraclass correlation coefficient (ICC) and concordance by Cohen’s kappa (k). Results: The CVS-Q was developed with wide consensus among experts and was well accepted by the target group. It assesses the frequency and intensity of 16 symptoms using a single rating scale (symptom severity) that fits the Rasch rating scale model well. The questionnaire has sensitivity and specificity over 70% and achieved good testeretest repeatability both for the scores obtained [ICC 5 0.802; 95% confidence interval (CI): 0.673, 0.884] and CVS classification (k 5 0.612; 95% CI: 0.384, 0.839). Conclusion: The CVS-Q has acceptable psychometric properties, making it a valid and reliable tool to control the visual health of computer workers, and can potentially be used in clinical trials and outcome research.
Resumo:
In this article we describe a semantic localization dataset for indoor environments named ViDRILO. The dataset provides five sequences of frames acquired with a mobile robot in two similar office buildings under different lighting conditions. Each frame consists of a point cloud representation of the scene and a perspective image. The frames in the dataset are annotated with the semantic category of the scene, but also with the presence or absence of a list of predefined objects appearing in the scene. In addition to the frames and annotations, the dataset is distributed with a set of tools for its use in both place classification and object recognition tasks. The large number of labeled frames in conjunction with the annotation scheme make this dataset different from existing ones. The ViDRILO dataset is released for use as a benchmark for different problems such as multimodal place classification and object recognition, 3D reconstruction or point cloud data compression.