805 resultados para audio data classification
Resumo:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.
Resumo:
This paper investigates engaging experienced birders, as volunteer citizen scientists, to analyze large recorded audio datasets gathered through environmental acoustic monitoring. Although audio data is straightforward to gather, automated analysis remains a challenging task; the existing expertise, local knowledge and motivation of the birder community can complement computational approaches and provide distinct benefits. We explored both the culture and practice of birders, and paradigms for interacting with recorded audio data. A variety of candidate design elements were tested with birders. This study contributes an understanding of how virtual interactions and practices can be developed to complement existing practices of experienced birders in the physical world. In so doing this study contributes a new approach to engagement in e-science. Whereas most citizen science projects task lay participants with discrete real world or artificial activities, sometimes using extrinsic motivators, this approach builds on existing intrinsically satisfying practices.
Resumo:
The role that specific emotions, such as pride and triumph, play during instruction in science education is an under-researched field of study. Emotions are recognized as central to learning yet little is known about the way in which they are produced in naturalistic settings, how emotions relate to classroom learning during interactions, and what antecedent factors are associated with emotional experiences during instruction. Data sources for the study include emotion diaries, student written artifacts, video recordings of class interactions, and interviews. Emotions produced in the moment during classroom interactions are analyzed from video data and audio data through a novel theoretical framework related to the sociology of human emotions. These direct observations are compared with students’ recollected emotional experiences reported through emotion diaries and interviews. The study establishes links between pride and triumph within classroom interactions and instructional tasks during learning episodes in a naturalistic setting. We discuss particular classroom activities that are associated with justified feelings of pride and triumph. More specifically, classroom events associated with these emotions were related to understanding science concepts, social interactions, and achieving success on challenging tasks.
Resumo:
Environmental sensors collect massive amounts of audio data. This thesis investigates computational methods to support human analysts in identifying faunal vocalisations from that audio. A series of experiments was conducted to trial the effectiveness of novel user interfaces. This research examines the rapid scanning of spectrograms, decision support tools for users, and cleaning methods for folksonomies. Together, these investigations demonstrate that providing computational support to human analysts increases their efficiency and accuracy; this allows bioacoustics projects to efficiently utilise their valuable human analysts.
Resumo:
Bioacoustic monitoring has become a significant research topic for species diversity conservation. Due to the development of sensing techniques, acoustic sensors are widely deployed in the field to record animal sounds over a large spatial and temporal scale. With large volumes of collected audio data, it is essential to develop semi-automatic or automatic techniques to analyse the data. This can help ecologists make decisions on how to protect and promote the species diversity. This paper presents generic features to characterize a range of bird species for vocalisation retrieval. In the implementation, audio recordings are first converted to spectrograms using short-time Fourier transform, then a ridge detection method is applied to the spectrogram for detecting points of interest. Based on the detected points, a new region representation are explored for describing various bird vocalisations and a local descriptor including temporal entropy, frequency bin entropy and histogram of counts of four ridge directions is calculated for each sub-region. To speed up the retrieval process, indexing is carried out and the retrieved results are ranked according to similarity scores. The experiment results show that our proposed feature set can achieve 0.71 in term of retrieval success rate which outperforms spectral ridge features alone (0.55) and Mel frequency cepstral coefficients (0.36).
Resumo:
The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki.
Resumo:
In this paper, pattern classification problem in tool wear monitoring is solved using nature inspired techniques such as Genetic Programming(GP) and Ant-Miner (AM). The main advantage of GP and AM is their ability to learn the underlying data relationships and express them in the form of mathematical equation or simple rules. The extraction of knowledge from the training data set using GP and AM are in the form of Genetic Programming Classifier Expression (GPCE) and rules respectively. The GPCE and AM extracted rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in GP evolved GPCE and AM based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The performance of the data classification using GP and AM is as good as the classification accuracy obtained in the earlier study.
Resumo:
In Finland, there is a desperate need for flexible, reliable and functional multi-e-learning settings for pupils aged 11-13. Southern Finland has several ongoing e-learning projects, but none that develop a multiple setting, with learning and teaching occurring between more than two schools. In 2006, internet connections were not broadband and data transfer was mainly audio data. Connections and technical problems occurred, which were an obstacle to multi-e-learning. Internet connections today enable web-based learning in major parts of
Lapland and by 2015, broadband will reach even the remotest villages up north. Therefore, it is important to research the possibilities of multi-e-learning and to build collaborative, learner-centred, versatile network models for primary school-aged pupils. The resulting model will facilitate distance learning to extend education to rural, sparsely populated areas, and it will give a model of using mobile devices in language portfolios. This will promote regional equality and prevent exclusion. Working with portfolios provides the opportunity to develop mobility from a pedagogical point of view. It is important to study the pros and cons of mobile devices in producing artefacts on portfolios in e-learning and language learning settings.
The current study represents a design-based research approach. The design research approach includes two important aspects concerning the current research: ‘a teacher as researcher’ aspect, which means there is the possibility to be strongly involved in developing processes and an obstacle-aspect, which means that problems while developing, are seen as a
promoter in evolving the designed model, as apposed to negative results.
Resumo:
In Finland, there is a desperate need for flexible, reliable and functional multi-e-learning settings for pupils aged 11-13. Southern Finland has several ongoing e-learning projects, but none that develop a multiple setting, with learning and teaching occurring between more than two schools. In 2006, internet connections were not broadband and data transfer was mainly audio data. Connections and technical problems occurred, which were an obstacle to multi-e-learning. Internet connections today enable web-based learning in major parts of Lapland and by 2015, broadband will reach even the remotest villages up north. Therefore, it is important to research the possibilities of multi-e-learning and to build collaborative, learner-centred, versatile network models for primary school-aged pupils. The resulting model will facilitate distance learning to extend education to rural, sparsely populated areas, and it will give a model of using mobile devices in language portfolios. This will promote regional equality and prevent exclusion. Working with portfolios provides the opportunity to develop mobility from a pedagogical point of view. It is important to study the pros and cons of mobile devices in producing artefacts on portfolios in e-learning and language learning settings. The current study represents a design-based research approach. The design research approach includes two important aspects concerning the current research: ‘a teacher as researcher’ aspect, which means there is the possibility to be strongly involved in developing processes and an obstacle-aspect, which means that problems while developing, are seen as a promoter in evolving the designed model, as apposed to negative results.
Resumo:
In this paper we show the applicability of Ant Colony Optimisation (ACO) techniques for pattern classification problem that arises in tool wear monitoring. In an earlier study, artificial neural networks and genetic programming have been successfully applied to tool wear monitoring problem. ACO is a recent addition to evolutionary computation technique that has gained attention for its ability to extract the underlying data relationships and express them in form of simple rules. Rules are extracted for data classification using training set of data points. These rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in ACO based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The classification accuracy obtained in ACO based approach is as good as obtained in other biologically inspired techniques.
Resumo:
This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.
Resumo:
The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rg(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rg recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.
Resumo:
Este estudo, de cunho histórico-social, tem como objeto a inserção de enfermeiras como oficiais da Força Aérea Brasileira (FAB) por meio do pioneiro Quadro Feminino de Oficiais (QFO). O marco inicial do estudo refere-se ao início do Estágio de Adaptação militar, em 02 de agosto de 1982 no Centro de Instrução Especializada da Aeronáutica (CIEAR), localizado na cidade do Rio de Janeiro. O marco final do estudo diz respeito ao término do período inicial obrigatório de dois anos de cumprimento de serviço ativo dessas enfermeiras, que culminou com a promoção das mesmas ao posto de 1Tenente (1984). Os objetivos do estudo são: descrever as circunstâncias de inserção das enfermeiras no processo seletivo do QFO, analisar o processo de incorporação do habitus militar durante o Estágio de Adaptação, e discutir as estratégias de luta das enfermeiras militares para ocuparem seus lugares devidos nos hospitais da FAB. A técnica de coleta de dados utilizada foi a entrevista e ocorreu no período de abril a maio de 2009 em hospitais da FAB da cidade do Rio de Janeiro. Foram entrevistadas cinco enfermeiras militares da primeira turma do QFO. O estudo foi cadastrado no SISNEP e aprovado pelo Comitê de Ética da FAB. Todos os sujeitos assinaram o Termo de consentimento livre e esclarecido e o Termo de doação de depoimento oral. O método utilizado foi o da História oral temática o referencial teórico do estudo foi baseado no pensamento do sociólogo francês Pierre Bourdieu, cujos conceitos de poder simbólico, habitus, campo, espaço social e violência simbólica sustentaram a construção desta dissertação. Para a análise e interpretação dos dados, seguimos os passos propostos por Maria Cecília Minayo de ordenação de dados, que compreendeu a transcrição na íntegra dos depoimentos; classificação cronológica e temática dos documentos escritos; classificação dos dados e a análise final. Evidenciou-se que diversos motivos incentivaram as enfermeiras a almejarem sua inserção na FAB como a boa remuneração, estabilidade financeira, progressão profissional, desbravamento de um novo campo de trabalho, clientela diferenciada, aposentadoria com salário integral e pioneirismo na FAB. O objetivo do Estágio de Adaptação militar foi inculcar do habitus militar nas candidatas a partir de ensinamentos baseados na hierarquia, disciplina, ética, dever e compromisso militar. Ao se inserirem nos hospitais da FAB, as enfermeiras receberam diversos cargos e funções, galgando um poder simbólico sobre a equipe de enfermagem. As inevitáveis lutas simbólicas dessas enfermeiras ocorreram com os médicos militares, com a equipe de enfermagem, com as enfermeiras civis e com a própria administração do hospital, e revelaram aspectos característicos de violência simbólica desencadeada por lutas de gênero e pela manutenção do poder, visto que as enfermeiras, dotadas de status de chefe e de militar, se inseriram num campo eminentemente masculino.
Resumo:
Com cada vez mais intenso desenvolvimento urbano e industrial, atualmente um desafio fundamental é eliminar ou reduzir o impacto causado pelas emissões de poluentes para a atmosfera. No ano de 2012, o Rio de Janeiro sediou a Rio +20, a Conferência das Nações Unidas sobre Desenvolvimento Sustentável, onde representantes de todo o mundo participaram. Na época, entre outros assuntos foram discutidos a economia verde e o desenvolvimento sustentável. O O3 troposférico apresenta-se como uma variável extremamente importante devido ao seu forte impacto ambiental, e conhecer o comportamento dos parâmetros que afetam a qualidade do ar de uma região, é útil para prever cenários. A química das ciências atmosféricas e meteorologia são altamente não lineares e, assim, as previsões de parâmetros de qualidade do ar são difíceis de serem determinadas. A qualidade do ar depende de emissões, de meteorologia e topografia. Os dados observados foram o dióxido de nitrogênio (NO2), monóxido de nitrogênio (NO), óxidos de nitrogênio (NOx), monóxido de carbono (CO), ozônio (O3), velocidade escalar vento (VEV), radiação solar global (RSG), temperatura (TEM), umidade relativa (UR) e foram coletados através da estação móvel de monitoramento da Secretaria do Meio Ambiente (SMAC) do Rio de Janeiro em dois locais na área metropolitana, na Pontifícia Universidade Católica (PUC-Rio) e na Universidade do Estado do Rio de Janeiro (UERJ) no ano de 2011 e 2012. Este estudo teve três objetivos: (1) analisar o comportamento das variáveis, utilizando o método de análise de componentes principais (PCA) de análise exploratória, (2) propor previsões de níveis de O3 a partir de poluentes primários e de fatores meteorológicos, comparando a eficácia dos métodos não lineares, como as redes neurais artificiais (ANN) e regressão por máquina de vetor de suporte (SVM-R), a partir de poluentes primários e de fatores meteorológicos e, finalmente, (3) realizar método de classificação de dados usando a classificação por máquina de vetor suporte (SVM-C). A técnica PCA mostrou que, para conjunto de dados da PUC as variáveis NO, NOx e VEV obtiveram um impacto maior sobre a concentração de O3 e o conjunto de dados da UERJ teve a TEM e a RSG como as variáveis mais importantes. Os resultados das técnicas de regressão não linear ANN e SVM obtidos foram muito próximos e aceitáveis para o conjunto de dados da UERJ apresentando coeficiente de determinação (R2) para a validação, 0,9122 e 0,9152 e Raiz Quadrada do Erro Médio Quadrático (RMECV) 7,66 e 7,85, respectivamente. Quanto aos conjuntos de dados PUC e PUC+UERJ, ambas as técnicas, obtiveram resultados menos satisfatórios. Para estes conjuntos de dados, a SVM mostrou resultados ligeiramente superiores, e PCA, SVM e ANN demonstraram sua robustez apresentando-se como ferramentas úteis para a compreensão, classificação e previsão de cenários da qualidade do ar
Resumo:
The Guardian newspaper (21st October 2005) informed its readers that: "Stanford University in California is to make its course content available on iTunes...The service, Stanford on iTunes, will provide…downloads of faculty lectures, campus events, performances, book readings, music recorded by Stanford students and even podcasts of Stanford football games". The emergence of Podcasting as means of sending audio data to users has clearly excited educational technologists around the world. This paper will explore the technologies behind Podcasting and how this could be used to develop and deliver new E-Learning material. The paper refers to the work done to create Podcasts of lectures for University of Greenwich students.