Biblioteca Digital

999 resultados para movie audio tracks

Complete-linkage clustering for voice activity detection in audio and visual speech

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.

Cross database training of audio-visual hidden Markov models for phone recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

RNASeqBrowser: A genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Strand specific RNAseq data is now more common in RNAseq projects. Visualizing RNAseq data has become an important matter in Analysis of sequencing data. The most widely used visualization tool is the UCSC genome browser that introduced the custom track concept that enabled researchers to simultaneously visualize gene expression at a particular locus from multiple experiments. Our objective of the software tool is to provide friendly interface for visualization of RNAseq datasets. Results This paper introduces a visualization tool (RNASeqBrowser) that incorporates and extends the functionality of the UCSC genome browser. For example, RNASeqBrowser simultaneously displays read coverage, SNPs, InDels and raw read tracks with other BED and wiggle tracks -- all being dynamically built from the BAM file. Paired reads are also connected in the browser to enable easier identification of novel exon/intron borders and chimaeric transcripts. Strand specific RNAseq data is also supported by RNASeqBrowser that displays reads above (positive strand transcript) or below (negative strand transcripts) a central line. Finally, RNASeqBrowser was designed for ease of use for users with few bioinformatic skills, and incorporates the features of many genome browsers into one platform. Conclusions The features of RNASeqBrowser: (1) RNASeqBrowser integrates UCSC genome browser and NGS visualization tools such as IGV. It extends the functionality of the UCSC genome browser by adding several new types of tracks to show NGS data such as individual raw reads, SNPs and InDels. (2) RNASeqBrowser can dynamically generate RNA secondary structure. It is useful for identifying non-coding RNA such as miRNA. (3) Overlaying NGS wiggle data is helpful in displaying differential expression and is simple to implement in RNASeqBrowser. (4) NGS data accumulates a lot of raw reads. Thus, RNASeqBrowser collapses exact duplicate reads to reduce visualization space. Normal PC’s can show many windows of NGS individual raw reads without much delay. (5) Multiple popup windows of individual raw reads provide users with more viewing space. This avoids existing approaches (such as IGV) which squeeze all raw reads into one window. This will be helpful for visualizing multiple datasets simultaneously. RNASeqBrowser and its manual are freely available at http://www.australianprostatecentre.org/research/software/rnaseqbrowser webcite or http://sourceforge.net/projects/rnaseqbrowser/ webcite

Image processing and classification procedure for the analysis of Australian frog vocalisations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Frog species have been declining worldwide at unprecedented rates in the past decades. There are many reasons for this decline including pollution, habitat loss, and invasive species [1]. To preserve, protect, and restore frog biodiversity, it is important to monitor and assess frog species. In this paper, a novel method using image processing techniques for analyzing Australian frog vocalisations is proposed. An FFT is applied to audio data to produce a spectrogram. Then, acoustic events are detected and isolated into corresponding segments through image processing techniques applied to the spectrogram. For each segment, spectral peak tracks are extracted with selected seeds and a region growing technique is utilised to obtain the contour of each frog vocalisation. Based on spectral peak tracks and the contour of each frog vocalisation, six feature sets are extracted. Principal component analysis reduces each feature set down to six principal components which are tested for classification performance with a k-nearest neighbor classifier. This experiment tests the proposed method of classification on fourteen frog species which are geographically well distributed throughout Queensland, Australia. The experimental results show that the best average classification accuracy for the fourteen frog species can be up to 87%.

Acoustic classification of Australian anurans using syllable features

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).

Undead yakuza: The Japanese zombie movie, cultural resonance and generic conventions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The zombie has long been regarded as a “fundamentally American creation” (Bishop 2010) and a western monster representing the fears and anxieties of Western society. Since the renaissance of the zombie movie in the early 2000s, a subsequent surge in international production has seen the release of movies from Norway, Cuba, Pakistan and Thailand to name a few. Although Japanese zombie movies have been far more visible for Western cult audiences than in mainstream markets, Japanese cinema has emerged as one of the more prolific producers of zombie films outside of Anglophone or Western European countries in recent years. Films such as Helldriver (2010), Zombie TV (2013), Versus (2000), Tokyo Zombie (2005), Happiness of the Katakuris (2001) and anime television series High School of the Dead (2010) have generated varying degrees of popularity and critical attention internationally. At first glance Japanese zombie films, with musical zombie interludes, undead yakuza henchmen and revenge-seeking yūrei zombies, appear fundamentally different to their Western counterparts. Yet, on closer examination, the Japanese zombie movie could be regarded as a hybrid and intertextual generic form drawing on syntactic conventions at the core of a universal zombie sub-genre established by Western filmmaking traditions, while also distilling culturally specific tropes unique to various Japanese horror cinema sub-genres. Most importantly, the Japanese zombie film extracts, emphasises and revises particular conventions and motifs common within Western zombie films that are particularly relevant to Japanese audiences. This chapter investigates the cultural resonance of key generic motifs identifiable in the Japanese zombie film. It establishes a production context and the influence of Japanese horror cinema on style and thematic concerns. It then examines the function of prominent narrative conventions, namely: the source, outbreak and spread of infection; mutation and the representation of the monster; and the inclusion of supernatural and religious motifs.

Alfred Lichtenstein making a movie; Berlin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital Image

The navigation and visualisation of environmental audio using zooming spectrograms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.

Alfred Lichtenstein making a movie; Berlin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital Image

Efficacy of nano-silver in alleviating bacteria-related blockage in cut rose cv. Movie Star stems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Postharvest treatments with nano-silver (NS) significantly improve water relations and therefore prolong the vase life of several cut flowers, including rose (Rosa hybrida cv. Movie Star). The efficacy of NS in alleviating bacterial related blockage in the stem-ends of cut cv. Movie Star was further investigated. Four dominant bacteria strains Pseudomonas fluorescens, Aeromonas sp., Comamonas acidovorans and Chryseomonas luteola were isolated from the stem-ends of cut roses. High numbers of the isolated bacteria at 10 8colony forming unitsmL -1 vase solution led to a sharp reduction in vase life, flower fresh weight, and water uptake. In vitro assessments of the antibacterial activity of NS against the four bacterial strains was >80% at 5mgL -1 and nearly 100% at 50mgL -1. Bacterial blockage in the stem-ends of cut cv. Movie Star roses with and without NS pulse treatments was assessed during the vase period using scanning electron microscopy. Following a 50mgL -1 NS pulse treatment, there were few bacterial cells on the cut surface of the stems even on day 7. Moreover, no obvious bacterial blockage was observed inside the xylem vessels. In contrast, the cut surface of control stems was covered with bacteria and associated amorphous substances, and numerous bacteria were found in the xylem vessels. © 2012 Elsevier B.V.

Perceptions of audio feedback in higher education assessment

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to present results of research investigating the effectiveness of audio feedback in a third year undergraduate unit. While there is a large and growing body of literature about providing assessment feedback, there is little focussing on the use of audio media. This study employs a mixed method approach, involving semi-structured interviews with academic staff and a survey of students. Analysis of the interview data suggests that there are a number of issues surrounding acceptance of using audio feedback by lecturers. The next stage of the study is to examine the extent to which lecturers change their perceptions as they use audio feedback and to analyse the perceptions of the students (n=120), including the perceived importance of feedback, the ways in which they used the audio feedback and the extent to which they believe they control events that affect them. Ultimately, this study seeks to provide recommendations appropriate to the implementation of audio feedback in higher education.

'Take away from the dry sixties style marking': Lecturer and student perceptions and experiences of audio feedback

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Providing audio feedback to assessment is relatively uncommon in higher education. However, published research suggests that it is preferred over written feedback by students but lecturers were less convinced. The aim of this paper is to examine further these findings in the context of a third year business ethics unit. Data was collected from two sources. The first is a series of in-depth, semi-structured interviews conducted with three lecturers providing audio feeback for the first time in Semester One 2011. The second source of data was drawn from the university student evaluation system. A total of 363 responses were used providing 'before' and 'after' perspectives about the effectiveness of audio feedback versus written feedback. Between 2005 and 2009 the survey data provided information about student attitudes to written assessment feedback (n=261). From 2010 onwards the data relates to audio (mp3) feedback (n=102). The analysis of he interview data indicated that introducing audio feedback should be done with care. The perception of the participating lecturers was mixed, ranging from sceptism to outright enthusiasm, but over time the overall approach became positive. It was found that particular attention needs to be paid to small (but important) technical details, and lecturers need to be convinced of its effectieness, especially that it is not necessarily more time consuming than providing written feedback. For students, the analysis revealed a clear preference for audio feedback. It is concluded that there is cause for concern and reason for optimism. It is a cause for concern because there is a possibility that scepticism on the part of academic staff seems to be based on assumptions about what students prefer and a concern about using the technology. There is reason for optimism because the evidence points towards students preferring audio feedback and as academic staff become more familiar with the technology the scepticism tends to evaporate. While this study is limited in scope, questions are raised about tackling negative staff perceptions of audio feedback that are worthy of further research.

Content-based birdcall retrieval from environmental audio

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This research investigates techniques to analyse long duration acoustic recordings to help ecologists monitor birdcall activities. It designs a generalized algorithm to identify a broad range of bird species. It allows ecologists to search for arbitrary birdcalls of interest, rather than restricting them to just a very limited number of species on which the recogniser is trained. The algorithm can help ecologists find sounds of interest more efficiently by filtering out large volumes of unwanted sounds and only focusing on birdcalls.

East Asian audio-visual collaboration and the global expansion of Chinese media

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).

«
1
2
3
4
5
6
7
8
...
66
67
»