999 resultados para Pitch detection
Resumo:
This paper discusses two pitch detection algorithms (PDA) for simple audio signals which are based on zero-cross rate (ZCR) and autocorrelation function (ACF). As it is well known, pitch detection methods based on ZCR and ACF are widely used in signal processing. This work shows some features and problems in using these methods, as well as some improvements developed to increase their performance. © 2008 IEEE.
Resumo:
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.
Resumo:
L’amusie congénitale est un trouble neuro-développemental se définissant par des difficultés à percevoir la musique, et ce malgré une ouïe et une intelligence normales. Un déficit de discrimination fine des hauteurs serait à l’origine de ce trouble, qui se traduit notamment par une incapacité à détecter les fausses notes. afin de mieux comprendre les facteurs génétiques contribuant à la manifestation de l’amusie congénitale, la présente étude avait pour objectif: (a) de déterminer si la performance sur diverses tâches musicales et auditives était plus similaire chez les jumeaux identiques (monozygotes ; MZ) que chez les jumeaux non-identiques (dizygotes ; DZ) et (b) d’explorer les variables relatives à l’environnement musical des jumeaux, afin de mieux comprendre les contributions de l’environnement et de la génétique dans les différences sous-tendant les habiletés musicales. De plus, le profil des sujets amusiques a été analysé afin de vérifier s’il correspondait à celui décrit dans la littérature, faisant état de difficultés tonales, mais non rythmiques. Huit paires de jumeaux MZ et six paires de jumeaux DZ, parmi lesquelles au moins un des co-jumeaux était potentiellement amusique, ont pris part à cette étude. Les tâches consistaient en un test en ligne de perception mélodique et rythmique, un test de détection des différences de hauteurs, ainsi qu’un test de chant. L’analyse de la performance et de l’environnement musical des jumeaux MZ et DZ ne révèle aucune distinction comportementale entre ces deux groupes en ce qui concerne les habiletés musicales. Cela suggère que celles-ci puissent être davantage influencées par l’environnement partagé que par les facteurs génétiques. Enfin, les jumeaux amusiques ont le profil habituel d’habiletés musicales. En effet, ils commettent des erreurs de perception et de production musicale au niveau mélodique, mais ont une perception rythmique préservée. D’autres études, notamment avec de plus grands échantillons de jumeaux, seront nécessaires afin d’élucider la possible étiologie génétique sous-tendant l’amusie congénitale.
Resumo:
El presente proyecto tiene el objetivo de facilitar la composición de canciones mediante la creación de las distintas pistas MIDI que la forman. Se implementan dos controladores. El primero, con objeto de transcribir la parte melódica, convierte la voz cantada o tarareada a eventos MIDI. Para ello, y tras el estudio de las distintas técnicas del cálculo del tono (pitch), se implementará una técnica con ciertas variaciones basada en la autocorrelación. También se profundiza en el segmentado de eventos, en particular, una técnica basada en el análisis de la derivada de la envolvente. El segundo, dedicado a la base rítmica de la canción, permite la creación de la percusión mediante el golpe rítmico de objetos que disponga el usuario, que serán asignados a los distintos elementos de percusión elegidos. Los resultados de la grabación de estos impactos serán señales de corta duración, no lineales y no armónicas, dificultando su discriminación. La herramienta elegida para la clasificación de los distintos patrones serán las redes neuronales artificiales (RNA). Se realizara un estudio de la metodología de diseño de redes neuronales especifico para este tipo de señales, evaluando la importancia de las variables de diseño como son el número de capas ocultas y neuronas en cada una de ellas, algoritmo de entrenamiento y funciones de activación. El estudio concluirá con la implementación de dos redes de diferente naturaleza. Una red de Elman, cuyas propiedades de memoria permiten la clasificación de patrones temporales, procesará las cualidades temporales analizando el ataque de su forma de onda. Una red de propagación hacia adelante feed-forward, que necesitará de robustas características espectrales y temporales para su clasificación. Se proponen 26 descriptores como los derivados de los momentos del espectro: centroide, curtosis y simetría, los coeficientes cepstrales de la escala de Mel (MFCCs), y algunos temporales como son la tasa de cruces por cero y el centroide de la envolvente temporal. Las capacidades de discriminación inter e intra clase de estas características serán evaluadas mediante un algoritmo de selección, habiéndose elegido RELIEF, un método basado en el algoritmo de los k vecinos mas próximos (KNN). Ambos controladores tendrán función de trabajar en tiempo real y offline, permitiendo tanto la composición de canciones, como su utilización como un instrumento más junto con mas músicos. ABSTRACT. The aim of this project is to make song composition easier by creating each MIDI track that builds it. Two controllers are implemented. In order to transcribe the melody, the first controler converts singing voice or humming into MIDI files. To do this a technique based on autocorrelation is implemented after having studied different pitch detection methods. Event segmentation has also been dealt with, to be more precise a technique based on the analysis of the signal's envelope and it's derivative have been used. The second one, can be used to make the song's rhythm . It allows the user, to create percussive patterns by hitting different objects of his environment. These recordings results in short duration, non-linear and non-harmonic signals. Which makes the classification process more complicated in the traditional way. The tools to used are the artificial neural networks (ANN). We will study the neural network design to deal with this kind of signals. The goal is to get a design methodology, paying attention to the variables involved, as the number of hidden layers and neurons in each, transfer functions and training algorithm. The study will end implementing two neural networks with different nature. Elman network, which has memory properties, is capable to recognize sequences of data and analyse the impact's waveform, precisely, the attack portion. A feed-forward network, needs strong spectral and temporal features extracted from the hit. Some descriptors are proposed as the derivates from the spectrum moment as centroid, kurtosis and skewness, the Mel-frequency cepstral coefficients, and some temporal features as the zero crossing rate (zcr) and the temporal envelope's centroid. Intra and inter class discrimination abilities of those descriptors will be weighted using the selection algorithm RELIEF, a Knn (K-nearest neighbor) based algorithm. Both MIDI controllers can be used to compose, or play with other musicians as it works on real-time and offline.
Resumo:
Este proyecto consiste en crear una serie de tres pequeños videojuegos incluidos en una sola aplicación, para plataformas móviles Android, que permitan en cualquier lugar entrenar la estética de la voz del paciente con problemas de fonación. Dependiendo de los aspectos de la voz (sonidos sonoros y sordos, el pitch y la intensidad) a trabajar se le asignará un ejercicio u otro. En primer lugar se introduce el concepto de rehabilitación de la voz y en qué casos es necesario. Seguidamente se realiza un trabajo de búsqueda en el que se identifican las distintas plataformas de desarrollo de videojuegos que son compatibles con los sistemas Android, así como para la captura de audio y las librerías de procesado de señal. A continuación se eligen las herramientas que presentan las mejores capacidades y con las que se va a trabajar. Estas son el motor de juego Andengine, para la parte gráfica, el entorno Java específico de Android, para la captura de muestras de audio y la librería JTransforms que realiza transformadas de Fourier permitiendo procesar el audio para la detección de pitch. Al desarrollar y ensamblar los distintos bloques se prioriza el funcionamiento en tiempo real de la aplicación. Las líneas de mejora y conclusiones se comentan en el último capítulo del trabajo así como el manual de usuario para mayor comprensión. ABSTRACT. The main aim of this project is to create an application for mobile devices which includes three small speech therapy videogames for the Android OS. These videogames allow patients to train certain voice parameters (such as voice and unvoiced sounds, pitch and intensity) wherever they want and need to. First, an overview of the concept of voice rehabilitation and its uses for patients with speech disorders is given. Secondly a study has been made to identify the most suitable video game engine for the Android OS, the best possible way to capture audio from the device and the audio processing library which will combine with the latter. Therefore, the chosen tools are exposed. Andengine has been selected regarding the game engine, Android’s Java framework for audio capture and the fast Fourier transform library, JTransforms, for pitch detection. Real time processing is vital for the proper functioning of the application. Lines of improvement and other conclusions are discussed in the last part of this dissertation paper.
Resumo:
Purpose: To evaluate the diagnostic value and image quality of CT with filtered back projection (FBP) compared with adaptive statistical iterative reconstructed images (ASIR) in body stuffers with ingested cocaine-filled packets.Methods and Materials: Twenty-nine body stuffers (mean age 31.9 years, 3 women) suspected for ingestion of cocaine-filled packets underwent routine-dose 64-row multidetector CT with FBP (120kV, pitch 1.375, 100-300 mA and automatic tube current modulation (auto mA), rotation time 0.7sec, collimation 2.5mm), secondarily reconstructed with 30 % and 60 % ASIR. In 13 (44.83%) out of the body stuffers cocaine-filled packets were detected, confirmed by exact analysis of the faecal content including verification of the number (range 1-25). Three radiologists independently and blindly evaluated anonymous CT examinations (29 FBP-CT and 68 ASIR-CT) for the presence and number of cocaine-filled packets indicating observers' confidence, and graded them for diagnostic quality, image noise, and sharpness. Sensitivity, specificity, area under the receiver operating curve (ROC) Az and interobserver agreement between the 3 radiologists for FBP-CT and ASIR-CT were calculated.Results: The increase of the percentage of ASIR significantly diminished the objective image noise (p<0.001). Overall sensitivity and specificity for the detection of the cocaine-filled packets were 87.72% and 76.15%, respectively. The difference of ROC area Az between the different reconstruction techniques was significant (p= 0.0101), that is 0.938 for FBP-CT, 0.916 for 30 % ASIR-CT, and 0.894 for 60 % ASIR-CT.Conclusion: Despite the evident image noise reduction obtained by ASIR, the diagnostic value for detecting cocaine-filled packets decreases, depending on the applied ASIR percentage.
Resumo:
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETS1) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In this study, we compared direction detection thresholds of passive self-motion in the dark between artistic gymnasts and controls. Twenty-four professional female artistic gymnasts (ranging from 7 to 20 years) and age-matched controls were seated on a motion platform and asked to discriminate the direction of angular (yaw, pitch, roll) and linear (leftward–rightward) motion. Gymnasts showed lower thresholds for the linear leftward–rightward motion. Interestingly, there was no difference for the angular motions. These results show that the outstanding self-motion abilities in artistic gymnasts are not related to an overall higher sensitivity in self-motion perception. With respect to vestibular processing, our results suggest that gymnastic expertise is exclusively linked to superior interpretation of otolith signals when no change in canal signals is present. In addition, thresholds were overall lower for the older (14–20 years) than for the younger (7–13 years) participants, indicating the maturation of vestibular sensitivity from childhood to adolescence.
Resumo:
OBJECTIVE The purpose of this study was to investigate the feasibility of microdose CT using a comparable dose as for conventional chest radiographs in two planes including dual-energy subtraction for lung nodule assessment. MATERIALS AND METHODS We investigated 65 chest phantoms with 141 lung nodules, using an anthropomorphic chest phantom with artificial lung nodules. Microdose CT parameters were 80 kV and 6 mAs, with pitch of 2.2. Iterative reconstruction algorithms and an integrated circuit detector system (Stellar, Siemens Healthcare) were applied for maximum dose reduction. Maximum intensity projections (MIPs) were reconstructed. Chest radiographs were acquired in two projections with bone suppression. Four blinded radiologists interpreted the images in random order. RESULTS A soft-tissue CT kernel (I30f) delivered better sensitivities in a pilot study than a hard kernel (I70f), with respective mean (SD) sensitivities of 91.1% ± 2.2% versus 85.6% ± 5.6% (p = 0.041). Nodule size was measured accurately for all kernels. Mean clustered nodule sensitivity with chest radiography was 45.7% ± 8.1% (with bone suppression, 46.1% ± 8%; p = 0.94); for microdose CT, nodule sensitivity was 83.6% ± 9% without MIP (with additional MIP, 92.5% ± 6%; p < 10(-3)). Individual sensitivities of microdose CT for readers 1, 2, 3, and 4 were 84.3%, 90.7%, 68.6%, and 45.0%, respectively. Sensitivities with chest radiography for readers 1, 2, 3, and 4 were 42.9%, 58.6%, 36.4%, and 90.7%, respectively. In the per-phantom analysis, respective sensitivities of microdose CT versus chest radiography were 96.2% and 75% (p < 10(-6)). The effective dose for chest radiography including dual-energy subtraction was 0.242 mSv; for microdose CT, the applied dose was 0.1323 mSv. CONCLUSION Microdose CT is better than the combination of chest radiography and dual-energy subtraction for the detection of solid nodules between 5 and 12 mm at a lower dose level of 0.13 mSv. Soft-tissue kernels allow better sensitivities. These preliminary results indicate that microdose CT has the potential to replace conventional chest radiography for lung nodule detection.
Resumo:
Introduction: In team sports the ability to use peripheral vision is essential to track a number of players and the ball. By using eye-tracking devices it was found that players either use fixations and saccades to process information on the pitch or use smooth pursuit eye movements (SPEM) to keep track of single objects (Schütz, Braun, & Gegenfurtner, 2011). However, it is assumed that peripheral vision can be used best when the gaze is stable while it is unknown whether motion changes can be equally well detected when SPEM are used especially because contrast sensitivity is reduced during SPEM (Schütz, Delipetkose, Braun, Kerzel, & Gegenfurtner, 2007). Therefore, peripheral motion change detection will be examined by contrasting a fixation condition with a SPEM condition. Methods: 13 participants (7 male, 6 female) were presented with a visual display consisting of 15 white and 1 red square. Participants were instructed to follow the red square with their eyes and press a button as soon as a white square begins to move. White square movements occurred either when the red square was still (fixation condition) or moving in a circular manner with 6 °/s (pursuit condition). The to-be-detected white square movements varied in eccentricity (4 °, 8 °, 16 °) and speed (1 °/s, 2 °/s, 4 °/s) while movement time of white squares was constant at 500 ms. 180 events should be detected in total. A Vicon-integrated eye-tracking system and a button press (1000 Hz) was used to control for eye-movements and measure detection rates and response times. Response times (ms) and missed detections (%) were measured as dependent variables and analysed with a 2 (manipulation) x 3 (eccentricity) x 3 (speed) ANOVA with repeated measures on all factors. Results: Significant response time effects were found for manipulation, F(1,12) = 224.31, p < .01, ηp2 = .95, eccentricity, F(2,24) = 56.43; p < .01, ηp2 = .83, and the interaction between the two factors, F(2,24) = 64.43; p < .01, ηp2 = .84. Response times increased as a function of eccentricity for SPEM only and were overall higher than in the fixation condition. Results further showed missed events effects for manipulation, F(1,12) = 37.14; p < .01, ηp2 = .76, eccentricity, F(2,24) = 44.90; p < .01, ηp2 = .79, the interaction between the two factors, F(2,24) = 39.52; p < .01, ηp2 = .77 and the three-way interaction manipulation x eccentricity x speed, F(2,24) = 3.01; p = .03, ηp2 = .20. While less than 2% of events were missed on average in the fixation condition as well as at 4° and 8° eccentricity in the SPEM condition, missed events increased for SPEM at 16 ° eccentricity with significantly more missed events in the 4 °/s speed condition (1 °/s: M = 34.69, SD = 20.52; 2 °/s: M = 33.34, SD = 19.40; 4 °/s: M = 39.67, SD = 19.40). Discussion: It could be shown that using SPEM impairs the ability to detect peripheral motion changes at the far periphery and that fixations not only help to detect these motion changes but also to respond faster. Due to high temporal constraints especially in team sports like soccer or basketball, fast reaction are necessary for successful anticipation and decision making. Thus, it is advised to anchor gaze at a specific location if peripheral changes (e.g. movements of other players) that require a motor response have to be detected. In contrast, SPEM should only be used if a single object, like the ball in cricket or baseball, is necessary for a successful motor response. References: Schütz, A. C., Braun, D. I., & Gegenfurtner, K. R. (2011). Eye movements and perception: A selective review. Journal of Vision, 11, 1-30. Schütz, A. C., Delipetkose, E., Braun, D. I., Kerzel, D., & Gegenfurtner, K. R. (2007). Temporal contrast sensitivity during smooth pursuit eye movements. Journal of Vision, 7, 1-15.
Resumo:
Obstructive sleep apnea (OSA) is a highly prevalent disease in which upper airways are collapsed during sleep, leading to serious consequences. The gold standard of diagnosis, called polysomnography (PSG), requires a full-night hospital stay connected to over ten channels of measurements requiring physical contact with sensors. PSG is inconvenient, expensive and unsuited for community screening. Snoring is the earliest symptom of OSA, but its potential in clinical diagnosis is not fully recognized yet. Diagnostic systems intent on using snore-related sounds (SRS) face the tough problem of how to define a snore. In this paper, we present a working definition of a snore, and propose algorithms to segment SRS into classes of pure breathing, silence and voiced/unvoiced snores. We propose a novel feature termed the 'intra-snore-pitch-jump' (ISPJ) to diagnose OSA. Working on clinical data, we show that ISPJ delivers OSA detection sensitivities of 86-100% while holding specificity at 50-80%. These numbers indicate that snore sounds and the ISPJ have the potential to be good candidates for a take-home device for OSA screening. Snore sounds have the significant advantage in that they can be conveniently acquired with low-cost non-contact equipment. The segmentation results presented in this paper have been derived using data from eight patients as the training set and another eight patients as the testing set. ISPJ-based OSA detection results have been derived using training data from 16 subjects and testing data from 29 subjects.
Resumo:
To assess binocular detection grating acuity using the LEA GRATINGS test to establish age-related norms in healthy infants during their first 3 months of life. In this prospective, longitudinal study of healthy infants with clear red reflex at birth, responses to gratings were measured at 1, 2, and 3 months of age using LEA gratings at a distance of 28 cm. The results were recorded as detection grating acuity values, which were arranged in frequency tables and converted to a one-octave scale for statistical analysis. For the repeated measurements, analysis of variance (ANOVA) was used to compare the detection grating acuity results between ages. A total of 133 infants were included. The binocular responses to gratings showed development toward higher mean values and spatial frequencies, ranging from 0.55 ± 0.70 cycles per degree (cpd), or 1.74 ± 0.21 logMAR, in month 1 to 3.11 ± 0.54 cpd, or 0.98 ± 0.16 logMAR, in month 3. Repeated ANOVA indicated differences among grating acuity values in the three age groups. The LEA GRATINGS test allowed assessment of detection grating acuity and its development in a cohort of healthy infants during their first 3 months of life.
Resumo:
A novel capillary electrophoresis method using capacitively coupled contactless conductivity detection is proposed for the determination of the biocide tetrakis(hydroxymethyl)phosphonium sulfate. The feasibility of the electrophoretic separation of this biocide was attributed to the formation of an anionic complex between the biocide and borate ions in the background electrolyte. Evidence of this complex formation was provided by (11) B NMR spectroscopy. A linear relationship (R(2) = 0.9990) between the peak area of the complex and the biocide concentration (50-900 μmol/L) was found. The limit of detection and limit of quantification were 15.0 and 50.1 μmol/L, respectively. The proposed method was applied to the determination of tetrakis(hydroxymethyl)phosphonium sulfate in commercial formulations, and the results were in good agreement with those obtained by the standard iodometric titration method. The method was also evaluated for the analysis of tap water and cooling water samples treated with the biocide. The results of the recovery tests at three concentration levels (300, 400, and 600 μmol/L) varied from 75 to 99%, with a relative standard deviation no higher than 9%.
Resumo:
Infections of the central nervous systems (CNS) present a diagnostic problem for which an accurate laboratory diagnosis is essential. Invasive practices, such as cerebral biopsy, have been replaced by obtaining a polymerase chain reaction (PCR) diagnosis using cerebral spinal fluid (CSF) as a reference method. Tests on DNA extracted from plasma are noninvasive, thus avoiding all of the collateral effects and patient risks associated with CSF collection. This study aimed to determine whether plasma can replace CSF in nested PCR analysis for the detection of CNS human herpesvirus (HHV) diseases by analysing the proportion of patients whose CSF nested PCR results were positive for CNS HHV who also had the same organism identified by plasma nested PCR. In this study, CSF DNA was used as the gold standard, and nested PCR was performed on both types of samples. Fifty-two patients with symptoms of nervous system infection were submitted to CSF and blood collection. For the eight HHV, one positive DNA result-in plasma and/or CSF nested PCR-was considered an active HHV infection, whereas the occurrence of two or more HHVs in the same sample was considered a coinfection. HHV infections were positively detected in 27/52 (51.9%) of the CSF and in 32/52 (61.5%) of the plasma, difference not significant, thus nested PCR can be performed on plasma instead of CSF. In conclusion, this findings suggest that plasma as a useful material for the diagnosis of cases where there is any difficulty to perform a CSF puncture.