967 resultados para linear predictive coding (LPC)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background - When a moving stimulus and a briefly flashed static stimulus are physically aligned in space the static stimulus is perceived as lagging behind the moving stimulus. This vastly replicated phenomenon is known as the Flash-Lag Effect (FLE). For the first time we employed biological motion as the moving stimulus, which is important for two reasons. Firstly, biological motion is processed by visual as well as somatosensory brain areas, which makes it a prime candidate for elucidating the interplay between the two systems with respect to the FLE. Secondly, discussions about the mechanisms of the FLE tend to recur to evolutionary arguments, while most studies employ highly artificial stimuli with constant velocities. Methodology/Principal Finding - Since biological motion is ecologically valid it follows complex patterns with changing velocity. We therefore compared biological to symbolic motion with the same acceleration profile. Our results with 16 observers revealed a qualitatively different pattern for biological compared to symbolic motion and this pattern was predicted by the characteristics of motor resonance: The amount of anticipatory processing of perceived actions based on the induced perspective and agency modulated the FLE. Conclusions/Significance - Our study provides first evidence for an FLE with non-linear motion in general and with biological motion in particular. Our results suggest that predictive coding within the sensorimotor system alone cannot explain the FLE. Our findings are compatible with visual prediction (Nijhawan, 2008) which assumes that extrapolated motion representations within the visual system generate the FLE. These representations are modulated by sudden visual input (e.g. offset signals) or by input from other systems (e.g. sensorimotor) that can boost or attenuate overshooting representations in accordance with biased neural competition (Desimone & Duncan, 1995).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage, la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007]. Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme, pour attaquer des systèmes de tatouage sonore. Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’) des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié (‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme. Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement, on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs attaques. Les résultats confirment la supériorité de notre approche pour la localisation des pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux. Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores. Cette technique est basée sur la représentation par spikegrammes des signaux sonores et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode de tatouage sonore très robuste. Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps. Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3 et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP, le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme. Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal, les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’) se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés par une autre. Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Matroidal networks were introduced by Dougherty et al. and have been well studied in the recent past. It was shown that a network has a scalar linear network coding solution if and only if it is matroidal associated with a representable matroid. The current work attempts to establish a connection between matroid theory and network-error correcting codes. In a similar vein to the theory connecting matroids and network coding, we abstract the essential aspects of network-error correcting codes to arrive at the definition of a matroidal error correcting network. An acyclic network (with arbitrary sink demands) is then shown to possess a scalar linear error correcting network code if and only if it is a matroidal error correcting network associated with a representable matroid. Therefore, constructing such network-error correcting codes implies the construction of certain representable matroids that satisfy some special conditions, and vice versa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The algebraic formulation for linear network coding in acyclic networks with the links having integer delay is well known. Based on this formulation, for a given set of connections over an arbitrary acyclic network with integer delay assumed for the links, the output symbols at the sink nodes, at any given time instant, is a F(p)m-linear combination of the input symbols across different generations, where F(p)m denotes the field over which the network operates (p is prime and m is a positive integer). We use finite-field discrete Fourier transform to convert the output symbols at the sink nodes, at any given time instant, into a F(p)m-linear combination of the input symbols generated during the same generation without making use of memory at the intermediate nodes. We call this as transforming the acyclic network with delay into n-instantaneous networks (n is sufficiently large). We show that under certain conditions, there exists a network code satisfying sink demands in the usual (nontransform) approach if and only if there exists a network code satisfying sink demands in the transform approach. When the zero-interference conditions are not satisfied, we propose three precoding-based network alignment (PBNA) schemes for three-source three-destination multiple unicast network with delays (3-S 3-D MUN-D) termed as PBNA using transform approach and time-invariant local encoding coefficients (LECs), PBNA using time-varying LECs, and PBNA using transform approach and block time-varying LECs. We derive sets of necessary and sufficient conditions under which throughputs close to n' + 1/2n' + 1, n'/2n' + 1, and n'/2n' + 1 are achieved for the three source-destination pairs in a 3-S 3-D MUN-D employing PBNA using transform approach and time-invariant LECs, and PBNA using transform approach and block time-varying LECs, where n' is a positive integer. For PBNA using time-varying LECs, we obtain a sufficient condition under which a throughput demand of n(1)/n, n(2)/n, and n(3)/n can be met for the three source-destination pairs in a 3-S 3-D MUN-D, where n(1), n(2), and n(3) are positive integers less than or equal to the positive integer n. This condition is also necessary when n(1) + n(3) = n(1) + n(2) = n where n(1) >= n(2) >= n(3).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Matroidal networks were introduced by Dougherty et al. and have been well studied in the recent past. It was shown that a network has a scalar linear network coding solution if and only if it is matroidal associated with a representable matroid. A particularly interesting feature of this development is the ability to construct (scalar and vector) linearly solvable networks using certain classes of matroids. Furthermore, it was shown through the connection between network coding and matroid theory that linear network coding is not always sufficient for general network coding scenarios. The current work attempts to establish a connection between matroid theory and network-error correcting and detecting codes. In a similar vein to the theory connecting matroids and network coding, we abstract the essential aspects of linear network-error detecting codes to arrive at the definition of a matroidal error detecting network (and similarly, a matroidal error correcting network abstracting from network-error correcting codes). An acyclic network (with arbitrary sink demands) is then shown to possess a scalar linear error detecting (correcting) network code if and only if it is a matroidal error detecting (correcting) network associated with a representable matroid. Therefore, constructing such network-error correcting and detecting codes implies the construction of certain representable matroids that satisfy some special conditions, and vice versa. We then present algorithms that enable the construction of matroidal error detecting and correcting networks with a specified capability of network-error correction. Using these construction algorithms, a large class of hitherto unknown scalar linearly solvable networks with multisource, multicast, and multiple-unicast network-error correcting codes is made available for theoretical use and practical implementation, with parameters, such as number of information symbols, number of sinks, number of coding nodes, error correcting capability, and so on, being arbitrary but for computing power (for the execution of the algorithms). The complexity of the construction of these networks is shown to be comparable with the complexity of existing algorithms that design multicast scalar linear network-error correcting codes. Finally, we also show that linear network coding is not sufficient for the general network-error correction (detection) problem with arbitrary demands. In particular, for the same number of network errors, we show a network for which there is a nonlinear network-error detecting code satisfying the demands at the sinks, whereas there are no linear network-error detecting codes that do the same.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Mobile Ad hoc NETworks (MANETs), where cooperative behaviour is mandatory, there is a high probability for some nodes to become overloaded with packet forwarding operations in order to support neighbor data exchange. This altruistic behaviour leads to an unbalanced load in the network in terms of traffic and energy consumption. In such scenarios, mobile nodes can benefit from the use of energy efficient and traffic fitting routing protocol that better suits the limited battery capacity and throughput limitation of the network. This PhD work focuses on proposing energy efficient and load balanced routing protocols for ad hoc networks. Where most of the existing routing protocols simply consider the path length metric when choosing the best route between a source and a destination node, in our proposed mechanism, nodes are able to find several routes for each pair of source and destination nodes and select the best route according to energy and traffic parameters, effectively extending the lifespan of the network. Our results show that by applying this novel mechanism, current flat ad hoc routing protocols can achieve higher energy efficiency and load balancing. Also, due to the broadcast nature of the wireless channels in ad hoc networks, other technique such as Network Coding (NC) looks promising for energy efficiency. NC can reduce the number of transmissions, number of re-transmissions, and increase the data transfer rate that directly translates to energy efficiency. However, due to the need to access foreign nodes for coding and forwarding packets, NC needs a mitigation technique against unauthorized accesses and packet corruption. Therefore, we proposed different mechanisms for handling these security attacks by, in particular by serially concatenating codes to support reliability in ad hoc network. As a solution to this problem, we explored a new security framework that proposes an additional degree of protection against eavesdropping attackers based on using concatenated encoding. Therefore, malicious intermediate nodes will find it computationally intractable to decode the transitive packets. We also adopted another code that uses Luby Transform (LT) as a pre-coding code for NC. Primarily being designed for security applications, this code enables the sink nodes to recover corrupted packets even in the presence of byzantine attacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The combination of model predictive control based on linear models (MPC) with feedback linearization (FL) has attracted interest for a number of years, giving rise to MPC+FL control schemes. An important advantage of such schemes is that feedback linearizable plants can be controlled with a linear predictive controller with a fixed model. Handling input constraints within such schemes is difficult since simple bound contraints on the input become state dependent because of the nonlinear transformation introduced by feedback linearization. This paper introduces a technique for handling input constraints within a real time MPC/FL scheme, where the plant model employed is a class of dynamic neural networks. The technique is based on a simple affine transformation of the feasible area. A simulated case study is presented to illustrate the use and benefits of the technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study was to estimate the spatial distribution of work accident risk in the informal work market in the urban zone of an industrialized city in southeast Brazil and to examine concomitant effects of age, gender, and type of occupation after controlling for spatial risk variation. The basic methodology adopted was that of a population-based case-control study with particular interest focused on the spatial location of work. Cases were all casual workers in the city suffering work accidents during a one-year period; controls were selected from the source population of casual laborers by systematic random sampling of urban homes. The spatial distribution of work accidents was estimated via a semiparametric generalized additive model with a nonparametric bidimensional spline of the geographical coordinates of cases and controls as the nonlinear spatial component, and including age, gender, and occupation as linear predictive variables in the parametric component. We analyzed 1,918 cases and 2,245 controls between 1/11/2003 and 31/10/2004 in Piracicaba, Brazil. Areas of significantly high and low accident risk were identified in relation to mean risk in the study region (p < 0.01). Work accident risk for informal workers varied significantly in the study area. Significant age, gender, and occupational group effects on accident risk were identified after correcting for this spatial variation. A good understanding of high-risk groups and high-risk regions underpins the formulation of hypotheses concerning accident causality and the development of effective public accident prevention policies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The human voice is an important communication tool and any disorder of the voice can have profound implications for social and professional life of an individual. Techniques of digital signal processing have been used by acoustic analysis of vocal disorders caused by pathologies in the larynx, due to its simplicity and noninvasive nature. This work deals with the acoustic analysis of voice signals affected by pathologies in the larynx, specifically, edema, and nodules on the vocal folds. The purpose of this work is to develop a classification system of voices to help pre-diagnosis of pathologies in the larynx, as well as monitoring pharmacological treatments and after surgery. Linear Prediction Coefficients (LPC), Mel Frequency cepstral coefficients (MFCC) and the coefficients obtained through the Wavelet Packet Transform (WPT) are applied to extract relevant characteristics of the voice signal. For the classification task is used the Support Vector Machine (SVM), which aims to build optimal hyperplanes that maximize the margin of separation between the classes involved. The hyperplane generated is determined by the support vectors, which are subsets of points in these classes. According to the database used in this work, the results showed a good performance, with a hit rate of 98.46% for classification of normal and pathological voices in general, and 98.75% in the classification of diseases together: edema and nodules