926 resultados para audio programming
Resumo:
An approach to spatialization is described in which the pixels of an image determine both spatial and other attributes of individual elements in a multi-channel musical texture. The application of this technique in the author’s composition Spaced Images with Noise and Lines is discussed in detail. The relationship of this technique to existing image-to-sound mappings is discussed. The particular advantage of modifying spatial properties with image filters is considered.
Resumo:
The movement of graphics and audio programming towards three dimensions is to better simulate the way we experience our world. In this project I looked to use methods for coming closer to such simulation via realistic graphics and sound combined with a natural interface. I did most of my work on a Dell OptiPlex with an 800 MHz Pentium III processor and an NVIDlA GeForce 256 AGP Plus graphics accelerator -high end products in the consumer market as of April 2000. For graphics, I used OpenGL [1], an open·source, multi-platform set of graphics libraries that is relatively easy to use, coded in C . The basic engine I first put together was a system to place objects in a scene and to navigate around the scene in real time. Once I accomplished this, I was able to investigate specific techniques for making parts of a scene more appealing.
Resumo:
This paper proposes a new method for local key and chord estimation from audio signals. This method relies primarily on principles from music theory, and does not require any training on a corpus of labelled audio files. A harmonic content of the musical piece is first extracted by computing a set of chroma vectors. A set of chord/key pairs is selected for every frame by correlation with fixed chord and key templates. An acyclic harmonic graph is constructed with these pairs as vertices, using a musical distance to weigh its edges. Finally, the sequences of chords and keys are obtained by finding the best path in the graph using dynamic programming. The proposed method allows a mutual chord and key estimation. It is evaluated on a corpus composed of Beatles songs for both the local key estimation and chord recognition tasks, as well as a larger corpus composed of songs taken from the Billboard dataset.
Resumo:
The technical challenges in the design and programming of signal processors for multimedia communication are discussed. The development of terminal equipment to meet such demand presents a significant technical challenge, considering that it is highly desirable that the equipment be cost effective, power efficient, versatile, and extensible for future upgrades. The main challenges in the design and programming of signal processors for multimedia communication are, general-purpose signal processor design, application-specific signal processor design, operating systems and programming support and application programming. The size of FFT is programmable so that it can be used for various OFDM-based communication systems, such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T) and digital video broadcasting-handheld (DVB-H). The clustered architecture design and distributed ping-pong register files in the PAC DSP raise new challenges of code generation.
Resumo:
Introduction
The use of video capture of lectures in Higher Education is not a recent occurrence with web based learning technologies including digital recording of live lectures becoming increasing commonly offered by universities throughout the world (Holliman and Scanlon, 2004). However in the past decade the increase in technical infrastructural provision including the availability of high speed broadband has increased the potential and use of videoed lecture capture. This had led to a variety of lecture capture formats including pod casting, live streaming or delayed broadcasting of whole or part of lectures.
Additionally in the past five years there has been a significant increase in the popularity of online learning, specifically via Massive Open Online Courses (MOOCs) (Vardi, 2014). One of the key aspects of MOOCs is the simulated recording of lecture like activities. There has been and continues to be much debate on the consequences of the popularity of MOOCs, especially in relation to its potential uses within established University programmes.
There have been a number of studies dedicated to the effects of videoing lectures.
The clustered areas of research in video lecture capture have the following main themes:
• Staff perceptions including attendance, performance of students and staff workload
• Reinforcement versus replacement of lectures
• Improved flexibility of learning
• Facilitating engaging and effective learning experiences
• Student usage, perception and satisfaction
• Facilitating students learning at their own pace
Most of the body of the research has concentrated on student and faculty perceptions, including academic achievement, student attendance and engagement (Johnston et al, 2012).
Generally the research has been positive in review of the benefits of lecture capture for both students and faculty. This perception coupled with technical infrastructure improvements and student demand may well mean that the use of video lecture capture will continue to increase in frequency in the next number of years in tertiary education. However there is a relatively limited amount of research in the effects of lecture capture specifically in the area of computer programming with Watkins 2007 being one of few studies . Video delivery of programming solutions is particularly useful for enabling a lecturer to illustrate the complex decision making processes and iterative nature of the actual code development process (Watkins et al 2007). As such research in this area would appear to be particularly appropriate to help inform debate and future decisions made by policy makers.
Research questions and objectives
The purpose of the research was to investigate how a series of lecture captures (in which the audio of lectures and video of on-screen projected content were recorded) impacted on the delivery and learning of a programme of study in an MSc Software Development course in Queen’s University, Belfast, Northern Ireland. The MSc is conversion programme, intended to take graduates from non-computing primary degrees and upskill them in this area. The research specifically targeted the Java programming module within the course. It also analyses and reports on the empirical data from attendances and various video viewing statistics. In addition, qualitative data was collected from staff and student feedback to help contextualise the quantitative results.
Methodology, Methods and Research Instruments Used
The study was conducted with a cohort of 85 post graduate students taking a compulsory module in Java programming in the first semester of a one year MSc in Software Development. A pre-course survey of students found that 58% preferred to have available videos of “key moments” of lectures rather than whole lectures. A large scale study carried out by Guo concluded that “shorter videos are much more engaging” (Guo 2013). Of concern was the potential for low audience retention for videos of whole lectures.
The lecturers recorded snippets of the lecture directly before or after the actual physical delivery of the lecture, in a quiet environment and then upload the video directly to a closed YouTube channel. These snippets generally concentrated on significant parts of the theory followed by theory related coding demonstration activities and were faithful in replication of the face to face lecture. Generally each lecture was supported by two to three videos of durations ranging from 20 – 30 minutes.
Attendance
The MSc programme has several attendance based modules of which Java Programming was one element. In order to assess the consequence on attendance for the Programming module a control was established. The control used was a Database module which is taken by the same students and runs in the same semester.
Access engagement
The videos were hosted on a closed YouTube channel made available only to the students in the class. The channel had enabled analytics which reported on the following areas for all and for each individual video; views (hits), audience retention, viewing devices / operating systems used and minutes watched.
Student attitudes
Three surveys were taken in regard to investigating student attitudes towards the videoing of lectures. The first was before the start of the programming module, then at the mid-point and subsequently after the programme was complete.
The questions in the first survey were targeted at eliciting student attitudes towards lecture capture before they had experienced it in the programme. The midpoint survey gathered data in relation to how the students were individually using the system up to that point. This included feedback on how many videos an individual had watched, viewing duration, primary reasons for watching and the result on attendance, in addition to probing for comments or suggestions. The final survey on course completion contained questions similar to the midpoint survey but in summative view of the whole video programme.
Conclusions and Outcomes
The study confirmed findings of other such investigations illustrating that there is little or no effect on attendance at lectures. The use of the videos appears to help promote continual learning but they are particularly accessed by students at assessment periods. Students respond positively to the ability to access lectures digitally, as a means of reinforcing learning experiences rather than replacing them. Feedback from students was overwhelmingly positive indicating that the videos benefited their learning. Also there are significant benefits to part recording of lectures rather than recording whole lectures. The behaviour viewing trends analytics suggest that despite the increase in the popularity of online learning via MOOCs and the promotion of video learning on mobile devices in fact in this study the vast majority of students accessed the online videos at home on laptops or desktops However, in part, this is likely due to the nature of the taught subject, that being programming.
The research involved prerecording the lecture in smaller timed units and then uploading for distribution to counteract existing quality issues with recording entire live lectures. However the advancement and consequential improvement in quality of in situ lecture capture equipment may well help negate the need to record elsewhere. The research has also highlighted an area of potentially very significant use for performance analysis and improvement that could have major implications for the quality of teaching. A study of the analytics of the viewings of the videos could well provide a quick response formative feedback mechanism for the lecturer. If a videoed lecture either recorded live or later is a true reflection of the face to face lecture an analysis of the viewing patterns for the video may well reveal trends that correspond with the live delivery.
Resumo:
Tese de doutoramento, Ciências Biomédicas (Biologia do Desenvolvimento), Universidade de Lisboa, Faculdade de Medicina, 2014
Resumo:
Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement.
Resumo:
Este proyecto pretende mostrar los desfases existentes entre señales de audio obtenidas de la misma fuente en distintos puntos distanciados entre sí. Para ello nos basamos en el análisis de la correlación de las señales de audio multi-microfónicas, para determinar los retrasos entre dichas señales. Durante las de tres partes diferentes que conforman este proyecto, explicaremos el dónde, cómo y por qué se produce este efecto en este tipo de señales. En la primera se presentan algunos de los conceptos teóricos necesarios para entender el desarrollo posterior, tales como la coherencia y correlación entre señales, los retardos de fase y la importancia del micro-tiempo. Además se explican diversas técnicas microfónicas que se utilizarán en la tercera parte. A lo largo de la segunda, se presenta el software desarrollado para determinar y corregir el retraso entre las señales que se deseen analizar. Para ello se ha escogido la herramienta de programación Matlab, ya que ha sido la más utilizada en la mayoría de las asignaturas que componen la titulación y por ello se posee el suficiente dominio de la misma. Además de presentar el propio software, al final de esta parte hay un manual de usuario del mismo, en el que se explica el manejo para posibles usos futuros por parte de otras personas interesadas. En la última parte se demuestra en varios casos reales, el estudio de la alineación de tomas multi-microfónicas en las cuales se produce en efecto que se intenta detectar y corregir. Aquí se realizan tres estudios de dicho fenómeno. En el primero se emplean señales digitales internas, concretamente ruido blanco, retrasando algunas muestras dichas señales unas de otras, para luego analizarlas con el software desarrollado y comprobar la eficacia del mismo. En el segundo se analizan la señales de audio obtenidas en el estudio de grabación de varios grupos de música moderna, mostrando los resultados del empleo del software en algunas de ellas, tales como las tomas de batería, bajo y guitarra. En el tercero se analizan las señales de audio obtenidas fuera del estudio de grabación, en donde no se dispone de las supuestas condiciones ideales que se tienen en el entorno que rodea a un estudio de grabación (acústicamente hablando). Se utilizan algunas de las técnicas microfónicas explicadas en el último apartado de la parte dedicada a los conceptos teóricos, para la grabación de una orquesta sinfónica, para luego analizar el efecto buscado mediante nuestro software, presentando los resultados obtenidos. De igual manera se realiza en el estudio con una agrupación coral de cuatro voces dentro de una Iglesia. ABSTRACT This project aims to show delays between audio signals obtained from the same source at diferent points spaced apart. To do this we rely on the analysis of the correlation of multi-microphonic audio signals, to determine the delay between these signals. During three diferent parts that make up this project, we will explain where, how and why this effect occurs in this type of signals. At the first part we present some of the theoretical concepts necessary to understand the subsequent development, such as coherence and correlation between signals, phase delays and the importance of micro-time. Also explains several microphone techniques to be used in the third part. During the second, it presents the software developed to determine and correct the delay between the signals that are desired to analyze. For this we have chosen the programming software Matlab , as it has been the most used in the majority of the subjects in the degree and therefore has suficient command of it. Besides presenting the software at the end of this part there is a user manual of it , which explains the handling for future use by other interested people. The last part is shown in several real cases, the study of aligning multi- microphonic sockets in which it is produced in effect trying to detect and correct. This includes three studies of this phenomenon. In the first internal digital signals are used, basically white noise, delaying some samples the signals from each other, then with software developed analyzing and verifying its efectiveness. In the second analyzes the audio signals obtained in the recording studio several contemporary bands, showing the results of using the software in some of them, such as the taking of drums, bass and guitar. In the third analyzes audio signals obtained outside the recording studio, where there are no ideal conditions alleged to have on the environment surrounding a recording studio (acoustically speaking). We use some of the microphone techniques explained in the last paragraph of the section on theoretical concepts, for the recording of a symphony orchestra, and then analyze the effect sought by our software, presenting the results. Similarly, in the study performed with a four-voice choir in a church.
Resumo:
La tecnología moderna de computación ha permitido cambiar radicalmente la investigación tecnológica en todos los ámbitos. El proceso general utilizado previamente consistía en el desarrollo de prototipos analógicos, creando múltiples versiones del mismo hasta llegar al resultado adecuado. Este es un proceso costoso a nivel económico y de carga de trabajo. Es por ello por lo que el proceso de investigación actual aprovecha las nuevas tecnologías para lograr el objetivo final mediante la simulación. Gracias al desarrollo de software para la simulación de distintas áreas se ha incrementado el ritmo de crecimiento de los avances tecnológicos y reducido el coste de los proyectos en investigación y desarrollo. La simulación, por tanto, permite desarrollar previamente prototipos simulados con un coste mucho menor para así lograr un producto final, el cual será llevado a cabo en su ámbito correspondiente. Este proceso no sólo se aplica en el caso de productos con circuitería, si bien es utilizado también en productos programados. Muchos de los programas actuales trabajan con algoritmos concretos cuyo funcionamiento debe ser comprobado previamente, para después centrarse en la codificación del mismo. Es en este punto donde se encuentra el objetivo de este proyecto, simular algoritmos de procesado digital de la señal antes de la codificación del programa final. Los sistemas de audio están basados en su totalidad en algoritmos de procesado de la señal, tanto analógicos como digitales, siendo estos últimos los que están sustituyendo al mundo analógico mediante los procesadores y los ordenadores. Estos algoritmos son la parte más compleja del sistema, y es la creación de nuevos algoritmos la base para lograr sistemas de audio novedosos y funcionales. Se debe destacar que los grupos de desarrollo de sistemas de audio presentan un amplio número de miembros con cometidos diferentes, separando las funciones de programadores e ingenieros de la señal de audio. Es por ello por lo que la simulación de estos algoritmos es fundamental a la hora de desarrollar nuevos y más potentes sistemas de audio. Matlab es una de las herramientas fundamentales para la simulación por ordenador, la cual presenta utilidades para desarrollar proyectos en distintos ámbitos. Sin embargo, en creciente uso actualmente se encuentra el software Simulink, herramienta especializada en la simulación de alto nivel que simplifica la dificultad de la programación en Matlab y permite desarrollar modelos de forma más rápida. Simulink presenta una completa funcionalidad para el desarrollo de algoritmos de procesado digital de audio. Por ello, el objetivo de este proyecto es el estudio de las capacidades de Simulink para generar sistemas de audio funcionales. A su vez, este proyecto pretende profundizar en los métodos de procesado digital de la señal de audio, logrando al final un paquete de sistemas de audio compatible con los programas de edición de audio actuales. ABSTRACT. Modern computer technology has dramatically changed the technological research in multiple areas. The overall process previously used consisted of the development of analog prototypes, creating multiple versions to reach the proper result. This is an expensive process in terms of an economically level and workload. For this reason actual investigation process take advantage of the new technologies to achieve the final objective through simulation. Thanks to the software development for simulation in different areas the growth rate of technological progress has been increased and the cost of research and development projects has been decreased. Hence, simulation allows previously the development of simulated protoypes with a much lower cost to obtain a final product, which will be held in its respective field. This process is not only applied in the case of circuitry products, but is also used in programmed products. Many current programs work with specific algorithms whose performance should be tested beforehand, which allows focusing on the codification of the program. This is the main point of this project, to simulate digital signal processing algorithms before the codification of the final program. Audio systems are entirely based on signal processing, both analog and digital systems, being the digital systems which are replacing the analog world thanks to the processors and computers. This algorithms are the most complex part of every system, and the creation of new algorithms is the most important step to achieve innovative and functional new audio systems. It should be noted that development groups of audio systems have a large number of members with different roles, separating them into programmers and audio signal engineers. For this reason, the simulation of this algorithms is essential when developing new and more powerful audio systems. Matlab is one of the most important tools for computer simulation, which has utilities to develop projects in different areas. However, the use of the Simulink software is constantly growing. It is a simulation tool specialized in high-level simulations which simplifies the difficulty of programming in Matlab and allows the developing of models faster. Simulink presents a full functionality for the development of algorithms for digital audio processing. Therefore, the objective of this project is to study the posibilities of Simulink to generate funcional audio systems. In turn, this projects aims to get deeper into the methods of digital audio signal processing, making at the end a software package of audio systems compatible with the current audio editing software.
Resumo:
Este proyecto consiste en el diseño e implementación de un procesador digital de efectos de audio en tiempo real orientado a instrumentos eléctricos tales como guitarras, bajos, teclados, etc. El procesador está basado en la tarjeta Raspberry Pi B+, ordenador de placa reducida de bajo coste, desarrollado en Reino unido y cuyo lanzamiento tuvo lugar en el año 2012. En primer lugar, ha sido necesario lograr que la tarjeta asuma la funcionalidad de un procesador de audio en tiempo real. Para ello se ha instalado un sistema operativo Linux orientado a Raspberry (Raspbian) y se ha hecho uso de Pure Data (Pd): lenguaje de programación gráfico que fue desarrollado en los años 90 por Miller Puckette con intención de ser enfocado a la creación de eventos multimedia y de música por computador. El papel que desempeña Pd es de capa intermedia entre el hardware y el software ya que se encarga de tomar bloques de N muestras del convertidor analógico/digital y encaminarlas a través del flujo de señal diseñado gráficamente. En segundo lugar, se han implementado diferentes efectos de audio de distintas características. Así pues, se encuentran efectos basados en retardos, filtros digitales y procesadores de dinámica. Concretamente, los efectos implementados son los siguientes: delay, flanger, vibrato, reverberador de Schroeder, filtros (paso bajo, paso alto y paso banda), ecualizador paramétrico y compresor y expansor de dinámica. Estos efectos han sido implementados en lenguaje C de acuerdo con la API de Pd. Con esto se ha conseguido obtener un objeto por cada efecto, el cual es “instanciado” en Pd pudiendo ejecutarlo en tiempo real. En este proyecto se expone la problemática que supone cada paso del diseño proponiendo soluciones válidas. Además se incluye una guía paso a paso para configurar la tarjeta y lograr realizar un bypass de señal y un efecto simple partiendo desde cero. ABSTRACT. This project involves the design and implementation of a digital real-time audio processor for electrical instruments (guitars, basses, keyboards, etc.). The processor is based on the Raspberry Pi B + card: low cost computer, developed in UK in 2012. First, it was necessary to make the cards assume the functionality of a real time audio processor. A Linux operating system called Raspberry (Raspbian) was installed. In this Project is used Pure Data (Pd): a graphical programming language developed in the 90s by Miller Puckette intending to be focused on creating multimedia and computer music events. The role of Pd is an intermediate layer between the hardware and the software. It is responsible for taking blocks of N samples of the analog/digital converter and route it through the signal flow. Secondly, it is necessary to implemented the different audio effects. There are delays based effects, digital filter and dynamics effects. Specifically, the implemented effects are: delay, flanger, vibrato, Schroeder reverb, filters (lowpass, highpass and bandpass), parametric equalizer and compressor and expander dynamics. These effects have been implemented in C language according to the Pd API. As a result, it has been obtained an object for each effect, which is instantiated in Pd. In this Project, the problems of every step are exposed with his corresponding solution. It is inlcuded a step-by-step guide to configure the card and achieve perform a bypass signal process and a simple effect.
Resumo:
Pitch Estimation, also known as Fundamental Frequency (F0) estimation, has been a popular research topic for many years, and is still investigated nowadays. The goal of Pitch Estimation is to find the pitch or fundamental frequency of a digital recording of a speech or musical notes. It plays an important role, because it is the key to identify which notes are being played and at what time. Pitch Estimation of real instruments is a very hard task to address. Each instrument has its own physical characteristics, which reflects in different spectral characteristics. Furthermore, the recording conditions can vary from studio to studio and background noises must be considered. This dissertation presents a novel approach to the problem of Pitch Estimation, using Cartesian Genetic Programming (CGP).We take advantage of evolutionary algorithms, in particular CGP, to explore and evolve complex mathematical functions that act as classifiers. These classifiers are used to identify piano notes pitches in an audio signal. To help us with the codification of the problem, we built a highly flexible CGP Toolbox, generic enough to encode different kind of programs. The encoded evolutionary algorithm is the one known as 1 + , and we can choose the value for . The toolbox is very simple to use. Settings such as the mutation probability, number of runs and generations are configurable. The cartesian representation of CGP can take multiple forms and it is able to encode function parameters. It is prepared to handle with different type of fitness functions: minimization of f(x) and maximization of f(x) and has a useful system of callbacks. We trained 61 classifiers corresponding to 61 piano notes. A training set of audio signals was used for each of the classifiers: half were signals with the same pitch as the classifier (true positive signals) and the other half were signals with different pitches (true negative signals). F-measure was used for the fitness function. Signals with the same pitch of the classifier that were correctly identified by the classifier, count as a true positives. Signals with the same pitch of the classifier that were not correctly identified by the classifier, count as a false negatives. Signals with different pitch of the classifier that were not identified by the classifier, count as a true negatives. Signals with different pitch of the classifier that were identified by the classifier, count as a false positives. Our first approach was to evolve classifiers for identifying artifical signals, created by mathematical functions: sine, sawtooth and square waves. Our function set is basically composed by filtering operations on vectors and by arithmetic operations with constants and vectors. All the classifiers correctly identified true positive signals and did not identify true negative signals. We then moved to real audio recordings. For testing the classifiers, we picked different audio signals from the ones used during the training phase. For a first approach, the obtained results were very promising, but could be improved. We have made slight changes to our approach and the number of false positives reduced 33%, compared to the first approach. We then applied the evolved classifiers to polyphonic audio signals, and the results indicate that our approach is a good starting point for addressing the problem of Pitch Estimation.