991 resultados para speech features
Resumo:
Memories in Adaptive Resonance Theory (ART) networks are based on matched patterns that focus attention on those portions of bottom-up inputs that match active top-down expectations. While this learning strategy has proved successful for both brain models and applications, computational examples show that attention to early critical features may later distort memory representations during online fast learning. For supervised learning, biased ARTMAP (bARTMAP) solves the problem of over-emphasis on early critical features by directing attention away from previously attended features after the system makes a predictive error. Small-scale, hand-computed analog and binary examples illustrate key model dynamics. Twodimensional simulation examples demonstrate the evolution of bARTMAP memories as they are learned online. Benchmark simulations show that featural biasing also improves performance on large-scale examples. One example, which predicts movie genres and is based, in part, on the Netflix Prize database, was developed for this project. Both first principles and consistent performance improvements on all simulation studies suggest that featural biasing should be incorporated by default in all ARTMAP systems. Benchmark datasets and bARTMAP code are available from the CNS Technology Lab Website: http://techlab.bu.edu/bART/.
Resumo:
Speech can be understood at widely varying production rates. A working memory is described for short-term storage of temporal lists of input items. The working memory is a cooperative-competitive neural network that automatically adjusts its integration rate, or gain, to generate a short-term memory code for a list that is independent of item presentation rate. Such an invariant working memory model is used to simulate data of Repp (1980) concerning the changes of phonetic category boundaries as a function of their presentation rate. Thus the variability of categorical boundaries can be traced to the temporal in variance of the working memory code.
Resumo:
A neural model of peripheral auditory processing is described and used to separate features of coarticulated vowels and consonants. After preprocessing of speech via a filterbank, the model splits into two parallel channels, a sustained channel and a transient channel. The sustained channel is sensitive to relatively stable parts of the speech waveform, notably synchronous properties of the vocalic portion of the stimulus it extends the dynamic range of eighth nerve filters using coincidence deteectors that combine operations of raising to a power, rectification, delay, multiplication, time averaging, and preemphasis. The transient channel is sensitive to critical features at the onsets and offsets of speech segments. It is built up from fast excitatory neurons that are modulated by slow inhibitory interneurons. These units are combined over high frequency and low frequency ranges using operations of rectification, normalization, multiplicative gating, and opponent processing. Detectors sensitive to frication and to onset or offset of stop consonants and vowels are described. Model properties are characterized by mathematical analysis and computer simulations. Neural analogs of model cells in the cochlear nucleus and inferior colliculus are noted, as are psychophysical data about perception of CV syllables that may be explained by the sustained transient channel hypothesis. The proposed sustained and transient processing seems to be an auditory analog of the sustained and transient processing that is known to occur in vision.
Resumo:
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.
Resumo:
Existing work in Computer Science and Electronic Engineering demonstrates that Digital Signal Processing techniques can effectively identify the presence of stress in the speech signal. These techniques use datasets containing real or actual stress samples i.e. real-life stress such as 911 calls and so on. Studies that use simulated or laboratory-induced stress have been less successful and inconsistent. Pervasive, ubiquitous computing is increasingly moving towards voice-activated and voice-controlled systems and devices. Speech recognition and speaker identification algorithms will have to improve and take emotional speech into account. Modelling the influence of stress on speech and voice is of interest to researchers from many different disciplines including security, telecommunications, psychology, speech science, forensics and Human Computer Interaction (HCI). The aim of this work is to assess the impact of moderate stress on the speech signal. In order to do this, a dataset of laboratory-induced stress is required. While attempting to build this dataset it became apparent that reliably inducing measurable stress in a controlled environment, when speech is a requirement, is a challenging task. This work focuses on the use of a variety of stressors to elicit a stress response during tasks that involve speech content. Biosignal analysis (commercial Brain Computer Interfaces, eye tracking and skin resistance) is used to verify and quantify the stress response, if any. This thesis explains the basis of the author’s hypotheses on the elicitation of affectively-toned speech and presents the results of several studies carried out throughout the PhD research period. These results show that the elicitation of stress, particularly the induction of affectively-toned speech, is not a simple matter and that many modulating factors influence the stress response process. A model is proposed to reflect the author’s hypothesis on the emotional response pathways relating to the elicitation of stress with a required speech content. Finally the author provides guidelines and recommendations for future research on speech under stress. Further research paths are identified and a roadmap for future research in this area is defined.
Resumo:
The authors analyzed several cytomorphonuclear parameters related to chromatin distribution and DNA ploidy in typical and atypical carcinoids and in small cell lung cancers. Nuclear measurements and analysis were performed with a SAMBA 200 (TITN, Grenoble, France) cell image processor with software allowing the discrimination of parameters computed on cytospin preparations of Feulgen-stained nuclei extracted from deparaffinized tumor tissues. The authors' results indicate a significant increase in DNA content--assessed by integrated optical density (IOD)--from typical carcinoids to small cell lung carcinomas, with atypical carcinoids showing an intermediate value. Parameters related to hyperchromatism (short and long run length and variance of optical density) also characterize the atypical carcinoids as being intermediate between typical carcinoids and small cell lung cancers. The systematic measurement of these cytomorphonuclear parameters seems to define an objective, reproducible "scale" of differentiation that helps to define the atypical carcinoid and may be of value in establishing cytologic criteria for differential diagnosis.
Resumo:
We present measurements of morphological features in a thick turbid sample using light-scattering spectroscopy (LSS) and Fourier-domain low-coherence interferometry (fLCI) by processing with the dual-window (DW) method. A parallel frequency domain optical coherence tomography (OCT) system with a white-light source is used to image a two-layer phantom containing polystyrene beads of diameters 4.00 and 6.98 mum on the top and bottom layers, respectively. The DW method decomposes each OCT A-scan into a time-frequency distribution with simultaneously high spectral and spatial resolution. The spectral information from localized regions in the sample is used to determine scatterer structure. The results show that the two scatterer populations can be differentiated using LSS and fLCI.
Resumo:
The affective impact of music arises from a variety of factors, including intensity, tempo, rhythm, and tonal relationships. The emotional coloring evoked by intensity, tempo, and rhythm appears to arise from association with the characteristics of human behavior in the corresponding condition; however, how and why particular tonal relationships in music convey distinct emotional effects are not clear. The hypothesis examined here is that major and minor tone collections elicit different affective reactions because their spectra are similar to the spectra of voiced speech uttered in different emotional states. To evaluate this possibility the spectra of the intervals that distinguish major and minor music were compared to the spectra of voiced segments in excited and subdued speech using fundamental frequency and frequency ratios as measures. Consistent with the hypothesis, the spectra of major intervals are more similar to spectra found in excited speech, whereas the spectra of particular minor intervals are more similar to the spectra of subdued speech. These results suggest that the characteristic affective impact of major and minor tone collections arises from associations routinely made between particular musical intervals and voiced speech.
Resumo:
BACKGROUND: Microsporidia are obligate intracellular, eukaryotic pathogens that infect a wide range of animals from nematodes to humans, and in some cases, protists. The preponderance of evidence as to the origin of the microsporidia reveals a close relationship with the fungi, either within the kingdom or as a sister group to it. Recent phylogenetic studies and gene order analysis suggest that microsporidia share a particularly close evolutionary relationship with the zygomycetes. METHODOLOGY/PRINCIPAL FINDINGS: Here we expanded this analysis and also examined a putative sex-locus for variability between microsporidian populations. Whole genome inspection reveals a unique syntenic gene pair (RPS9-RPL21) present in the vast majority of fungi and the microsporidians but not in other eukaryotic lineages. Two other unique gene fusions (glutamyl-prolyl tRNA synthetase and ubiquitin-ribosomal subunit S30) that are present in metazoans, choanoflagellates, and filasterean opisthokonts are unfused in the fungi and microsporidians. One locus previously found to be conserved in many microsporidian genomes is similar to the sex locus of zygomycetes in gene order and architecture. Both sex-related and sex loci harbor TPT, HMG, and RNA helicase genes forming a syntenic gene cluster. We sequenced and analyzed the sex-related locus in 11 different Encephalitozoon cuniculi isolates and the sibling species E. intestinalis (3 isolates) and E. hellem (1 isolate). There was no evidence for an idiomorphic sex-related locus in this Encephalitozoon species sample. According to sequence-based phylogenetic analyses, the TPT and RNA helicase genes flanking the HMG genes are paralogous rather than orthologous between zygomycetes and microsporidians. CONCLUSION/SIGNIFICANCE: The unique genomic hallmarks between microsporidia and fungi are independent of sequence based phylogenetic comparisons and further contribute to define the borders of the fungal kingdom and support the classification of microsporidia as unusual derived fungi. And the sex/sex-related loci appear to have been subject to frequent gene conversion and translocations in microsporidia and zygomycetes.
Resumo:
Factors influencing apoptosis of vertebrate eggs and early embryos have been studied in cell-free systems and in intact embryos by analyzing individual apoptotic regulators or caspase activation in static samples. A novel method for monitoring caspase activity in living Xenopus oocytes and early embryos is described here. The approach, using microinjection of a near-infrared caspase substrate that emits fluorescence only after its proteolytic cleavage by active effector caspases, has enabled the elucidation of otherwise cryptic aspects of apoptotic regulation. In particular, we show that brief caspase activity (10 min) is sufficient to cause apoptotic death in this system. We illustrate a cytochrome c dose threshold in the oocyte, which is lowered by Smac, a protein that binds thereby neutralizing the inhibitor of apoptosis proteins. We show that meiotic oocytes develop resistance to cytochrome c, and that the eventual death of oocytes arrested in meiosis is caspase-independent. Finally, data acquired through imaging caspase activity in the Xenopus embryo suggest that apoptosis in very early development is not cell-autonomous. These studies both validate this assay as a useful tool for apoptosis research and reveal subtleties in the cell death program during early development. Moreover, this method offers a potentially valuable screening modality for identifying novel apoptotic regulators.
Resumo:
Perceiving or producing complex vocalizations such as speech and birdsongs require the coordinated activity of neuronal populations, and these activity patterns can vary over space and time. How learned communication signals are represented by populations of sensorimotor neurons essential to vocal perception and production remains poorly understood. Using a combination of two-photon calcium imaging, intracellular electrophysiological recording and retrograde tracing methods in anesthetized adult male zebra finches (
Resumo:
Humans and song-learning birds communicate acoustically using learned vocalizations. The characteristic features of this social communication behavior include vocal control by forebrain motor areas, a direct cortical projection to brainstem vocal motor neurons, and dependence on auditory feedback to develop and maintain learned vocalizations. These features have so far not been found in closely related primate and avian species that do not learn vocalizations. Male mice produce courtship ultrasonic vocalizations with acoustic features similar to songs of song-learning birds. However, it is assumed that mice lack a forebrain system for vocal modification and that their ultrasonic vocalizations are innate. Here we investigated the mouse song system and discovered that it includes a motor cortex region active during singing, that projects directly to brainstem vocal motor neurons and is necessary for keeping song more stereotyped and on pitch. We also discovered that male mice depend on auditory feedback to maintain some ultrasonic song features, and that sub-strains with differences in their songs can match each other's pitch when cross-housed under competitive social conditions. We conclude that male mice have some limited vocal modification abilities with at least some neuroanatomical features thought to be unique to humans and song-learning birds. To explain our findings, we propose a continuum hypothesis of vocal learning.
Resumo:
Thirty years after fleeing from Poland to Denmark, 20 immigrants were enlisted in a study of bilingual autobiographical memory. Ten "early immigrators" averaged 24 years old at the time of immigration, and ten "late immigrators" averaged 34 years old at immigration. Although all 20 had spent 30 years in Denmark, early immigrators reported more current inner speech behaviours in Danish, whereas late immigrators showed more use of Polish. Both groups displayed proportionally more numerous autobiographical retrievals that were reported as coming to them internally in Polish (vs Danish) for the decades prior to immigration and more in Danish (vs Polish) after immigration. We propose a culture- and language-specific shaping of semantic and conceptual stores that underpins autobiographical and world knowledge.
Resumo:
X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.