903 resultados para Audio-visual content classification
Resumo:
There are still major challenges in the area of automatic indexing and retrieval of multimedia content data for very large multimedia content corpora. Current indexing and retrieval applications still use keywords to index multimedia content and those keywords usually do not provide any knowledge about the semantic content of the data. With the increasing amount of multimedia content, it is inefficient to continue with this approach. In this paper, we describe the project DREAM, which addresses such challenges by proposing a new framework for semi-automatic annotation and retrieval of multimedia based on the semantic content. The framework uses the Topic Map Technology, as a tool to model the knowledge automatically extracted from the multimedia content using an Automatic Labelling Engine. We describe how we acquire knowledge from the content and represent this knowledge using the support of NLP to automatically generate Topic Maps. The framework is described in the context of film post-production.
Resumo:
There are still major challenges in the area of automatic indexing and retrieval of digital data. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. Research has been ongoing for a few years in the field of ontological engineering with the aim of using ontologies to add knowledge to information. In this paper we describe the architecture of a system designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval.
Resumo:
The encoding of goal-oriented motion events varies across different languages. Speakers of languages without grammatical aspect (e.g., Swedish) tend to mention motion endpoints when describing events, e.g., “two nuns walk to a house,”, and attach importance to event endpoints when matching scenes from memory. Speakers of aspect languages (e.g., English), on the other hand, are more prone to direct attention to the ongoingness of motion events, which is reflected both in their event descriptions, e.g., “two nuns are walking.”, and in their non-verbal similarity judgements. This study examines to what extent native speakers of Swedish (n = 82) with English as a foreign language (FL) restructure their categorisation of goal-oriented motion as a function of their English proficiency and experience with the English language (e.g., exposure, learning). Seventeen monolingual native English speakers from the United Kingdom (UK) were engaged for comparison purposes. Data on motion event cognition were collected through a memory-based triads matching task, in which a target scene with an intermediate degree of endpoint orientation was matched with two alternative scenes with low and high degrees of endpoint orientation, respectively. Results showed that the preference among the Swedish speakers of L2 English to base their similarity judgements on ongoingness rather than event endpoints was correlated with their use of English in their everyday lives, such that those who often watched television in English approximated the ongoingness preference of the English native speakers. These findings suggest that event cognition patterns may be restructured through the exposure to FL audio-visual media. The results thus add to the emerging picture that learning a new language entails learning new ways of observing and reasoning about reality.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
Contiene informacion sobre el Proyecto III del programa de trabajo del CCCT: preparacion e intercambio de material audiovisual para la educacion sobre los beneficios, limitaciones y peligros del desarrollo en ciencia y tecnologia.
Resumo:
Includes bibliography
Resumo:
Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.
Resumo:
This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.