969 resultados para Speech perception
Resumo:
Increased media exposure to layoffs and corporate quarterly financial reporting have created arguable a common perception – especially favored by the media itself – that the companies have been forced to improve their financial performance from quarter to quarter. Academically the relevant question is whether companies themselves feel that they are exposed to short-term pressure to perform even if it means that they have to compromise company’s long-term future. This paper studies this issue using results from a survey conducted among the 500 largest companies in Finland. The results show that companies in general feel moderate short-term pressure, with reasonable dispersion across firms. There seems to be a link between the degree of pressure felt, and the firm’s ownership structure, i.e. we find support for the existence of short-term versus long-term owners. We also find significant ownership related differences, in line with expectations, in how such short-term pressure is reflected in actual decision variables such as the investment criteria used.
Resumo:
We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.
Resumo:
Non-uniform sampling of a signal is formulated as an optimization problem which minimizes the reconstruction signal error. Dynamic programming (DP) has been used to solve this problem efficiently for a finite duration signal. Further, the optimum samples are quantized to realize a speech coder. The quantizer and the DP based optimum search for non-uniform samples (DP-NUS) can be combined in a closed-loop manner, which provides distinct advantage over the open-loop formulation. The DP-NUS formulation provides a useful control over the trade-off between bitrate and performance (reconstruction error). It is shown that 5-10 dB SNR improvement is possible using DP-NUS compared to extrema sampling approach. In addition, the close-loop DP-NUS gives a 4-5 dB improvement in reconstruction error.
Resumo:
This paper describes a method of automated segmentation of speech assuming the signal is continuously time varying rather than the traditional short time stationary model. It has been shown that this representation gives comparable if not marginally better results than the other techniques for automated segmentation. A formulation of the 'Bach' (music semitonal) frequency scale filter-bank is proposed. A comparative study has been made of the performances using Mel, Bark and Bach scale filter banks considering this model. The preliminary results show up to 80 % matches within 20 ms of the manually segmented data, without any information of the content of the text and without any language dependence. 'Bach' filters are seen to marginally outperform the other filters.
Resumo:
Joint decoding of multiple speech patterns so as to improve speech recognition performance is important, especially in the presence of noise. In this paper, we propose a Multi-Pattern Viterbi algorithm (MPVA) to jointly decode and recognize multiple speech patterns for automatic speech recognition (ASR). The MPVA is a generalization of the Viterbi Algorithm to jointly decode multiple patterns given a Hidden Markov Model (HMM). Unlike the previously proposed two stage Constrained Multi-Pattern Viterbi Algorithm (CMPVA),the MPVA is a single stage algorithm. MPVA has the advantage that it cart be extended to connected word recognition (CWR) and continuous speech recognition (CSR) problems. MPVA is shown to provide better speech recognition performance than the earlier techniques: using only two repetitions of noisy speech patterns (-5 dB SNR, 10% burst noise), the word error rate using MPVA decreased by 28.5%, when compared to using individual decoding. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Background When we are viewing natural scenes, every saccade abruptly changes both the mean luminance and the contrast structure falling on any given retinal location. Thus it would be useful if the two were independently encoded by the visual system, even when they change simultaneously. Recordings from single neurons in the cat visual system have suggested that contrast information may be quite independently represented in neural responses to simultaneous changes in contrast and luminance. Here we test to what extent this is true in human perception. Methodology/Principal Findings Small contrast stimuli were presented together with a 7-fold upward or downward step of mean luminance (between 185 and 1295 Td, corresponding to 14 and 98 cd/m2), either simultaneously or with various delays (50–800 ms). The perceived contrast of the target under the different conditions was measured with an adaptive staircase method. Over the contrast range 0.1–0.45, mainly subtractive attenuation was found. Perceived contrast decreased by 0.052±0.021 (N = 3) when target onset was simultaneous with the luminance increase. The attenuation subsided within 400 ms, and even faster after luminance decreases, where the effect was also smaller. The main results were robust against differences in target types and the size of the field over which luminance changed. Conclusions/Significance Perceived contrast is attenuated mainly by a subtractive term when coincident with a luminance change. The effect is of ecologically relevant magnitude and duration; in other words, strict contrast constancy must often fail during normal human visual behaviour. Still, the relative robustness of the contrast signal is remarkable in view of the limited dynamic response range of retinal cones. We propose a conceptual model for how early retinal signalling may allow this.
Resumo:
The study analyses the ambivalent relationship republicanism, as a form of self-government free from domination, had with the ideal of participatory oratory and non-dominated speech on the one hand, and with the danger of unhindered demagogy and its possibly fatal consequences to that form of government on the other. Although previous scholarship has delved deeply into republicanism as well as into rhetoric and public speech, the interplay between those aspects has only gathered scattered interest, and there has been no systematic study considering the variety of republican approaches to rhetoric and public speech in 17th-century England. The rare attempts to do so have been studies in English literature, and they have not analysed the political philosophy of republicanism, as the focus has been on republicanism as a literary culture. This study connects the fields of political theory, political history as well as literature in order to make a multidisciplinary contribution to intellectual history. The study shows that, within the tradition of classical republicanism, individual authors could make different choices when addressing the problematic topics of public speech and rhetoric, and the variety of their conclusions often set the authors against each other, resulting in the development of their theories through internal debates within the republican tradition. The authors under study were chosen to reflect this variety and the connections between them: the similarities between James Harrington and John Streater, and between John Milton and John Hall of Durham are shown, as well the controversies between Harrington and Milton, and Streater and Hall, respectively. In addition, by analysing the writings of Marchamont Nedham the study will show that the choices were not limited to more, or less, democratic brands of republicanism. Most significantly, the study provides a thorough analysis of the political philosophies behind the various brands of republicanism, in addition to describing them. By means of this analysis, the study shows that previous attempts to assess the role of free speech and public debate, through the lenses of modern, rights-based liberal political theory have resulted in an inappropriate framework for understanding early modern English republicanism. By approaching the topics through concepts used by the republicans legitimate authority, leadership by oratory, and republican freedom and through the frames of reference available and familiar to them roles of education and institutions the study presents a thorough and systematic analysis of the role and function of rhetoric and public speech in English republicanism. The findings of this analysis have significant consequences to our current understanding of the history and development of republican political theory, and, more generally, of the connections between democratic theory and free speech.
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.
Resumo:
Maurice Merleau-Ponty (1908-1961) has been known as the philosopher of painting. His interest in the theory of perception intertwined with the questions concerning the artist s perception, the experience of an artwork and the possible interpretations of the artwork. For him, aesthetics was not a sub-field of philosophy, and art was not simply a subject matter for the aesthetic experience, but a form of thinking. This study proposes an opening for a dialogue between Merleau-Pontian phenomenology and contemporary art. The thesis examines his phenomenology through certain works of contemporary art and presents readings of these artworks through his phenomenology. The thesis both shows the potentiality of a method, but also engages in the critical task of finding the possible limitations of his approach. The first part lays out the methodological and conceptual points of departure of Merleau-Ponty s phenomenological approach to perception as well as the features that determined his discussion on encountering art. Merleau-Ponty referred to the experience of perceiving art using the notion of seeing with (voir selon). He stressed a correlative reciprocity described in Eye and Mind (1961) as the switching of the roles of the visible and the painter. The choice of artworks is motivated by certain restrictions in the phenomenological readings of visual arts. The examined works include paintings by Tiina Mielonen, a photographic work by Christian Mayer, a film by Douglas Gordon and Philippe Parreno, and an installation by Monika Sosnowska. These works resonate with, and challenge, his phenomenological approach. The chapters with case studies take up different themes that are central to Merleau-Ponty s phenomenology: space, movement, time, and touch. All of the themes are interlinked with the examined artworks. There are also topics that reappear in the thesis, such as the notion of écart and the question of encountering the other. As Merleau-Ponty argued, the sphere of art has a particular capability to address our being in the world. The thesis presents an interpretation that emphasises the notion of écart, which refers to an experience of divergence or dispossession. The sudden dissociation, surprise or rupture that is needed in order for a meeting between the spectator and the artwork, or between two persons, to be possible. Further, the thesis suggests that through artworks it is possible to take into consideration the écart, the divergence, that defines our subjectivity.
Resumo:
A new method based on unit continuity metric (UCM) is proposed for optimal unit selection in text-to-speech (TTS) synthesis. UCM employs two features, namely, pitch continuity metric and spectral continuity metric. The methods have been implemented and tested on our test bed called MILE-TTS and it is available as web demo. After verification by a self selection test, the algorithms are evaluated on 8 paragraphs each for Kannada and Tamil by native users of the languages. Mean-opinion-score (MOS) shows that naturalness and comprehension are better with UCM based algorithm than the non-UCM based ones. The naturalness of the TTS output is further enhanced by a new rule based algorithm for pause prediction for Tamil language. The pauses between the words are predicted based on parts-of-speech information obtained from the input text.
Resumo:
A comprehensive scheme has been developed for the prediction of radiation from engine exhaust and its incidence on an arbitrarily located sensor. Existing codes have been modified for the simulation of flows inside nozzles and jets. A novel view factor computation scheme has been applied for the determination of the radiosities of the discrete panels of a diffuse and gray nozzle surface. The narrowband model has been used to model the radiation from the gas inside the nozzle and the nonhomogeneous jet. The gas radiation from the nozzle inclusive of nozzle surface radiosities have been used as boundary conditions on the jet radiation. Geometric modeling techniques have been developed to identify and isolate nozzle surface panels and gas columns of the nozzle and jet to determine the radiation signals incident on the sensor. The scheme has been validated for intensity and heat flux predictions, and some useful results of practical importance have been generated to establish its viability for infrared signature analysis of jets.
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
We formulate a two-stage Iterative Wiener filtering (IWF) approach to speech enhancement, bettering the performance of constrained IWF, reported in literature. The codebook constrained IWF (CCIWF) has been shown to be effective in achieving convergence of IWF in the presence of both stationary and non-stationary noise. To this, we include a second stage of unconstrained IWF and show that the speech enhancement performance can be improved in terms of average segmental SNR (SSNR), Itakura-Saito (IS) distance and Linear Prediction Coefficients (LPC) parameter coincidence. We also explore the tradeoff between the number of CCIWF iterations and the second stage IWF iterations.
Resumo:
Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM