901 resultados para temporal speech information
Resumo:
La última década ha sido testigo de importantes avances en el campo de la tecnología de reconocimiento de voz. Los sistemas comerciales existentes actualmente poseen la capacidad de reconocer habla continua de múltiples locutores, consiguiendo valores aceptables de error, y sin la necesidad de realizar procedimientos explícitos de adaptación. A pesar del buen momento que vive esta tecnología, el reconocimiento de voz dista de ser un problema resuelto. La mayoría de estos sistemas de reconocimiento se ajustan a dominios particulares y su eficacia depende de manera significativa, entre otros muchos aspectos, de la similitud que exista entre el modelo de lenguaje utilizado y la tarea específica para la cual se está empleando. Esta dependencia cobra aún más importancia en aquellos escenarios en los cuales las propiedades estadísticas del lenguaje varían a lo largo del tiempo, como por ejemplo, en dominios de aplicación que involucren habla espontánea y múltiples temáticas. En los últimos años se ha evidenciado un constante esfuerzo por mejorar los sistemas de reconocimiento para tales dominios. Esto se ha hecho, entre otros muchos enfoques, a través de técnicas automáticas de adaptación. Estas técnicas son aplicadas a sistemas ya existentes, dado que exportar el sistema a una nueva tarea o dominio puede requerir tiempo a la vez que resultar costoso. Las técnicas de adaptación requieren fuentes adicionales de información, y en este sentido, el lenguaje hablado puede aportar algunas de ellas. El habla no sólo transmite un mensaje, también transmite información acerca del contexto en el cual se desarrolla la comunicación hablada (e.g. acerca del tema sobre el cual se está hablando). Por tanto, cuando nos comunicamos a través del habla, es posible identificar los elementos del lenguaje que caracterizan el contexto, y al mismo tiempo, rastrear los cambios que ocurren en estos elementos a lo largo del tiempo. Esta información podría ser capturada y aprovechada por medio de técnicas de recuperación de información (information retrieval) y de aprendizaje de máquina (machine learning). Esto podría permitirnos, dentro del desarrollo de mejores sistemas automáticos de reconocimiento de voz, mejorar la adaptación de modelos del lenguaje a las condiciones del contexto, y por tanto, robustecer al sistema de reconocimiento en dominios con condiciones variables (tales como variaciones potenciales en el vocabulario, el estilo y la temática). En este sentido, la principal contribución de esta Tesis es la propuesta y evaluación de un marco de contextualización motivado por el análisis temático y basado en la adaptación dinámica y no supervisada de modelos de lenguaje para el robustecimiento de un sistema automático de reconocimiento de voz. Esta adaptación toma como base distintos enfoque de los sistemas mencionados (de recuperación de información y aprendizaje de máquina) mediante los cuales buscamos identificar las temáticas sobre las cuales se está hablando en una grabación de audio. Dicha identificación, por lo tanto, permite realizar una adaptación del modelo de lenguaje de acuerdo a las condiciones del contexto. El marco de contextualización propuesto se puede dividir en dos sistemas principales: un sistema de identificación de temática y un sistema de adaptación dinámica de modelos de lenguaje. Esta Tesis puede describirse en detalle desde la perspectiva de las contribuciones particulares realizadas en cada uno de los campos que componen el marco propuesto: _ En lo referente al sistema de identificación de temática, nos hemos enfocado en aportar mejoras a las técnicas de pre-procesamiento de documentos, asimismo en contribuir a la definición de criterios más robustos para la selección de index-terms. – La eficiencia de los sistemas basados tanto en técnicas de recuperación de información como en técnicas de aprendizaje de máquina, y específicamente de aquellos sistemas que particularizan en la tarea de identificación de temática, depende, en gran medida, de los mecanismos de preprocesamiento que se aplican a los documentos. Entre las múltiples operaciones que hacen parte de un esquema de preprocesamiento, la selección adecuada de los términos de indexado (index-terms) es crucial para establecer relaciones semánticas y conceptuales entre los términos y los documentos. Este proceso también puede verse afectado, o bien por una mala elección de stopwords, o bien por la falta de precisión en la definición de reglas de lematización. En este sentido, en este trabajo comparamos y evaluamos diferentes criterios para el preprocesamiento de los documentos, así como también distintas estrategias para la selección de los index-terms. Esto nos permite no sólo reducir el tamaño de la estructura de indexación, sino también mejorar el proceso de identificación de temática. – Uno de los aspectos más importantes en cuanto al rendimiento de los sistemas de identificación de temática es la asignación de diferentes pesos a los términos de acuerdo a su contribución al contenido del documento. En este trabajo evaluamos y proponemos enfoques alternativos a los esquemas tradicionales de ponderado de términos (tales como tf-idf ) que nos permitan mejorar la especificidad de los términos, así como también discriminar mejor las temáticas de los documentos. _ Respecto a la adaptación dinámica de modelos de lenguaje, hemos dividimos el proceso de contextualización en varios pasos. – Para la generación de modelos de lenguaje basados en temática, proponemos dos tipos de enfoques: un enfoque supervisado y un enfoque no supervisado. En el primero de ellos nos basamos en las etiquetas de temática que originalmente acompañan a los documentos del corpus que empleamos. A partir de estas, agrupamos los documentos que forman parte de la misma temática y generamos modelos de lenguaje a partir de dichos grupos. Sin embargo, uno de los objetivos que se persigue en esta Tesis es evaluar si el uso de estas etiquetas para la generación de modelos es óptimo en términos del rendimiento del reconocedor. Por esta razón, nosotros proponemos un segundo enfoque, un enfoque no supervisado, en el cual el objetivo es agrupar, automáticamente, los documentos en clusters temáticos, basándonos en la similaridad semántica existente entre los documentos. Por medio de enfoques de agrupamiento conseguimos mejorar la cohesión conceptual y semántica en cada uno de los clusters, lo que a su vez nos permitió refinar los modelos de lenguaje basados en temática y mejorar el rendimiento del sistema de reconocimiento. – Desarrollamos diversas estrategias para generar un modelo de lenguaje dependiente del contexto. Nuestro objetivo es que este modelo refleje el contexto semántico del habla, i.e. las temáticas más relevantes que se están discutiendo. Este modelo es generado por medio de la interpolación lineal entre aquellos modelos de lenguaje basados en temática que estén relacionados con las temáticas más relevantes. La estimación de los pesos de interpolación está basada principalmente en el resultado del proceso de identificación de temática. – Finalmente, proponemos una metodología para la adaptación dinámica de un modelo de lenguaje general. El proceso de adaptación tiene en cuenta no sólo al modelo dependiente del contexto sino también a la información entregada por el proceso de identificación de temática. El esquema usado para la adaptación es una interpolación lineal entre el modelo general y el modelo dependiente de contexto. Estudiamos también diferentes enfoques para determinar los pesos de interpolación entre ambos modelos. Una vez definida la base teórica de nuestro marco de contextualización, proponemos su aplicación dentro de un sistema automático de reconocimiento de voz. Para esto, nos enfocamos en dos aspectos: la contextualización de los modelos de lenguaje empleados por el sistema y la incorporación de información semántica en el proceso de adaptación basado en temática. En esta Tesis proponemos un marco experimental basado en una arquitectura de reconocimiento en ‘dos etapas’. En la primera etapa, empleamos sistemas basados en técnicas de recuperación de información y aprendizaje de máquina para identificar las temáticas sobre las cuales se habla en una transcripción de un segmento de audio. Esta transcripción es generada por el sistema de reconocimiento empleando un modelo de lenguaje general. De acuerdo con la relevancia de las temáticas que han sido identificadas, se lleva a cabo la adaptación dinámica del modelo de lenguaje. En la segunda etapa de la arquitectura de reconocimiento, usamos este modelo adaptado para realizar de nuevo el reconocimiento del segmento de audio. Para determinar los beneficios del marco de trabajo propuesto, llevamos a cabo la evaluación de cada uno de los sistemas principales previamente mencionados. Esta evaluación es realizada sobre discursos en el dominio de la política usando la base de datos EPPS (European Parliamentary Plenary Sessions - Sesiones Plenarias del Parlamento Europeo) del proyecto europeo TC-STAR. Analizamos distintas métricas acerca del rendimiento de los sistemas y evaluamos las mejoras propuestas con respecto a los sistemas de referencia. ABSTRACT The last decade has witnessed major advances in speech recognition technology. Today’s commercial systems are able to recognize continuous speech from numerous speakers, with acceptable levels of error and without the need for an explicit adaptation procedure. Despite this progress, speech recognition is far from being a solved problem. Most of these systems are adjusted to a particular domain and their efficacy depends significantly, among many other aspects, on the similarity between the language model used and the task that is being addressed. This dependence is even more important in scenarios where the statistical properties of the language fluctuates throughout the time, for example, in application domains involving spontaneous and multitopic speech. Over the last years there has been an increasing effort in enhancing the speech recognition systems for such domains. This has been done, among other approaches, by means of techniques of automatic adaptation. These techniques are applied to the existing systems, specially since exporting the system to a new task or domain may be both time-consuming and expensive. Adaptation techniques require additional sources of information, and the spoken language could provide some of them. It must be considered that speech not only conveys a message, it also provides information on the context in which the spoken communication takes place (e.g. on the subject on which it is being talked about). Therefore, when we communicate through speech, it could be feasible to identify the elements of the language that characterize the context, and at the same time, to track the changes that occur in those elements over time. This information can be extracted and exploited through techniques of information retrieval and machine learning. This allows us, within the development of more robust speech recognition systems, to enhance the adaptation of language models to the conditions of the context, thus strengthening the recognition system for domains under changing conditions (such as potential variations in vocabulary, style and topic). In this sense, the main contribution of this Thesis is the proposal and evaluation of a framework of topic-motivated contextualization based on the dynamic and non-supervised adaptation of language models for the enhancement of an automatic speech recognition system. This adaptation is based on an combined approach (from the perspective of both information retrieval and machine learning fields) whereby we identify the topics that are being discussed in an audio recording. The topic identification, therefore, enables the system to perform an adaptation of the language model according to the contextual conditions. The proposed framework can be divided in two major systems: a topic identification system and a dynamic language model adaptation system. This Thesis can be outlined from the perspective of the particular contributions made in each of the fields that composes the proposed framework: _ Regarding the topic identification system, we have focused on the enhancement of the document preprocessing techniques in addition to contributing in the definition of more robust criteria for the selection of index-terms. – Within both information retrieval and machine learning based approaches, the efficiency of topic identification systems, depends, to a large extent, on the mechanisms of preprocessing applied to the documents. Among the many operations that encloses the preprocessing procedures, an adequate selection of index-terms is critical to establish conceptual and semantic relationships between terms and documents. This process might also be weakened by a poor choice of stopwords or lack of precision in defining stemming rules. In this regard we compare and evaluate different criteria for preprocessing the documents, as well as for improving the selection of the index-terms. This allows us to not only reduce the size of the indexing structure but also to strengthen the topic identification process. – One of the most crucial aspects, in relation to the performance of topic identification systems, is to assign different weights to different terms depending on their contribution to the content of the document. In this sense we evaluate and propose alternative approaches to traditional weighting schemes (such as tf-idf ) that allow us to improve the specificity of terms, and to better identify the topics that are related to documents. _ Regarding the dynamic language model adaptation, we divide the contextualization process into different steps. – We propose supervised and unsupervised approaches for the generation of topic-based language models. The first of them is intended to generate topic-based language models by grouping the documents, in the training set, according to the original topic labels of the corpus. Nevertheless, a goal of this Thesis is to evaluate whether or not the use of these labels to generate language models is optimal in terms of recognition accuracy. For this reason, we propose a second approach, an unsupervised one, in which the objective is to group the data in the training set into automatic topic clusters based on the semantic similarity between the documents. By means of clustering approaches we expect to obtain a more cohesive association of the documents that are related by similar concepts, thus improving the coverage of the topic-based language models and enhancing the performance of the recognition system. – We develop various strategies in order to create a context-dependent language model. Our aim is that this model reflects the semantic context of the current utterance, i.e. the most relevant topics that are being discussed. This model is generated by means of a linear interpolation between the topic-based language models related to the most relevant topics. The estimation of the interpolation weights is based mainly on the outcome of the topic identification process. – Finally, we propose a methodology for the dynamic adaptation of a background language model. The adaptation process takes into account the context-dependent model as well as the information provided by the topic identification process. The scheme used for the adaptation is a linear interpolation between the background model and the context-dependent one. We also study different approaches to determine the interpolation weights used in this adaptation scheme. Once we defined the basis of our topic-motivated contextualization framework, we propose its application into an automatic speech recognition system. We focus on two aspects: the contextualization of the language models used by the system, and the incorporation of semantic-related information into a topic-based adaptation process. To achieve this, we propose an experimental framework based in ‘a two stages’ recognition architecture. In the first stage of the architecture, Information Retrieval and Machine Learning techniques are used to identify the topics in a transcription of an audio segment. This transcription is generated by the recognition system using a background language model. According to the confidence on the topics that have been identified, the dynamic language model adaptation is carried out. In the second stage of the recognition architecture, an adapted language model is used to re-decode the utterance. To test the benefits of the proposed framework, we carry out the evaluation of each of the major systems aforementioned. The evaluation is conducted on speeches of political domain using the EPPS (European Parliamentary Plenary Sessions) database from the European TC-STAR project. We analyse several performance metrics that allow us to compare the improvements of the proposed systems against the baseline ones.
Resumo:
The building sector has experienced a significant decline in recent years in Spain and Europe as a result of the financial crisis that began in 2007. This drop accompanies a low penetration of information and communication technologies in inter-organizational oriented business processes. The market decrease is causing a slowdown in the building sector, where only flexible small and medium enterprises (SMEs) survive thanks to specialization and innovation in services, which allow them to face new market demands. Inter-organizational information systems (IOISs) support innovation in services, and are thus a strategic tool for SMEs to obtain competitive advantage. Because of the inherent complexity of IOIS adoption, this research extends Kurnia and Johnston's (2000) theoretical model of IOIS adoption with an empirical model of IOIS characterization. The resultant model identifies the factors influencing IOIS adoption in SMEs in the building sector, to promote further service innovation for competitive and collaborative advantages. An empirical longitudinal study over six consecutive years using data from Spanish SMEs in the building sector validates the model, using the partial least squares technique and analyzing temporal stability. The main findings of this research are the four ways an IOIS might contribute to service innovation in the building sector. Namely: a) improving client interfaces and the link between service providers and end users; b) defining a specific market where SMEs can develop new service concepts; c) enhancing the service delivery system in traditional customer?supplier relationships; and d) introducing information and communication technologies and tools to improve information management.
Resumo:
El origen de esta tesis considera una lectura (quizás) pendiente: definir críticamente a la monumentalidad en el contexto de la arquitectura moderna. La idea de lo monumental durante la modernidad establece parte de la negación enmarcada en un planteamiento más amplio, basado en el rechazo a todo vínculo con la tradición y la historia. Desde el estatismo del monumento como objeto anacrónico, a la instrumentación de la arquitectura como herramienta simbólica, el proceso transformador más importante para la arquitectura durante el siglo XX contaba con algunas señales que nos daban la pauta para imaginar una realidad conformada por matices y desacuerdos fundamentales. La investigación no pretende contar una nueva historia sobre el periodo moderno, aunque irremediablemente se vale de su registro para presentar la discusión. Así, la idea crítica que sostenemos tiene que ver con las posibilidades estructurales y objetivas del discurso arquitectónico. Un discurso que se analiza en función de tres campos diferenciados, designados como: lo escrito, lo proyectado y lo construido en el periodo de estudio. De esta manera, pensamos que se favorecen las posibilidades dimensionales de la crítica y se amplía el sentido narrativo de la linealidad histórica. Para esta trabajo, la monumentalidad constituye una sustancia de estudio que evidencia las contradicciones, inadvertencias y matices necesarios en la articulación de una visión más compleja sobre los acontecimientos. Convencidos de la eficacia de un modelo dialéctico, que define la condición de lo monumental tanto en una valoración positiva (lo propicio, lo útil, lo verdadero, etc.) como negativa (lo falso, lo ostentoso, lo altisonante, etc.); observaremos que las diferencias alrededor del concepto derivan respectivamente en los significados de monumentalidad y monumentalismo. El contraste y la oposición de ideas expuestas a la luz favorece esa pretensión dimensional de la crítica. De los escritos de Sigfried Giedion -y la Nueva Monumentalidad- a Le Corbusier y la construcción de Chandigarh; o de la crítica anti-monumental de Karel Teige, pasando por el proyecto constructivista de Ivan Leonidov; los distintos episodios referidos en el trabajo encuentran sentido y rechazan las probabilidades arbitrarias y confusas de la selección temática. En ese orden, se busca asignar cierto rigor metodológico e incluso geométrico: la estructura propuesta toma el gran "periodo moderno" en dos bloques temporales, primera-modernidad (alrededor de 1910-1935) y tardo-modernidad (aprox. 1935-1960). En la primera parte se analizan una postura -en mayor medida- reactiva a las manifestaciones de esa hipotética condición monumental, mientras que en el segundo caso la postura se transforma y se perfila un nuevo escenario que anticipará ideológicamente parte de la evidente fractura posmoderna. A su vez, los tres registros anunciados previamente se componen de dos capítulos en función del marco temporal descrito; cada capítulo se desarrolla en tres partes que abundan en los aspectos preliminares de la discusión, luego exponen unos puntos centrales y finalmente orientan un posible recuento. El trabajo se complementa con una parte introductoria que fluye sobre definiciones concretas del monumento, el monumentalismo y la monumentalidad; además de que definirá la orientación de la crítica desarrollada. En una última intervención, a manera de conclusión, se reflexiona sobre el salto temporal, ideológico y estético que la posmodernidad representó para el tema de investigación. Abstract The purpose of this thesis is to consider a (perhaps) pendant issue: to define monumentality by means of critical approach within the modern context of architecture. The idea of what monumental is during modernity establishes a fraction of the "modern typical denial" based on the rejection of any link to tradition and history. From the anachronistic idea of static monuments, to the orchestration of architecture as a symbolic tool, the most important process of the revolution of architecture during the 20th Century had a few signs that allowed us to imagine a reality conformed by fundamental nuances and disagreements. The aim of this research is not to tell a new story about the modern period, although inexorable it takes note of the register to present the discussion. Therefore, the idea of what we expose as criticism has to do with structural and objective possibilities in the architectural discourse. A speech analyzed in response to three differentiated domains designated here as: the written, the projected and the built during the selected time. In that way, we believe the dimensional possibilities of criticism are favored and the narrative sense of historical process is expanded. In terms of this investigation monumentality constitutes a matter of study that leads us to contradictions, unnoticed issues and necessary gray areas in the articulation of a complex vision about depicted events. We are convinced in the efficiency of a dialectical analysis model in order to define the monumental condition both as a positive value (propitious, useful, truthful, etc.) and a negative one (untrue, ostentatious, pompous, etc.); the idea is to show the differences around respective meanings deriving in terms of monumentality and monumentalism. Contrasting information and the opposition of ideas exposed in this light helped to develop the assumption of dimensional criticism. From Sigfried Giedeon's writings -and the New Monumentality- to Le Corbusier and the construction of Chandigarh; and from Karel Teige's anti-monumental criticism going through the revision of Ivan Leonidov's constructivist project; the variety of episodes referred to this work find some sense and reject the probabilities about confusion and arbitrary in the selection of themes. In order to assign some methodological precision and even geometrical criterion, the proposed structure divides the "great modern time" into two historical blocks: first-modernity (circa 1910-1935) and late-modernity (around 1935-1960). The first part analyzes a -mainly- reactive stance towards the hypothetical expressions of monumental condition, whereas in the second block the rejection tends to be transformed and to project a new scenario that will foresee the ideological postmodern fracture. At the same time, the three registers are composed by two chapters each one will operate depending on the described time frame. Each chapter is organized in three subsequent parts: at first explaining preliminary ideas for discussion, second presenting central points and finally orienting a partial recount. The research is complemented with an introductory episode describing specific definitions concerning the concepts of monument, monumentalism and monumentality; and mainly orienting the developed critique. In a final intervention, as a way of conclusion, we reflect on ideological and aesthetic qualities that postmodern time shift represented for this investigation.
Resumo:
This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students’ satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications.
Resumo:
Purpose The demand of rice by the increase in population in many countries has intensified the application of pesticides and the use of poor quality water to irrigate fields. The terrestrial environment is one compartment affected by these situations, where soil is working as a reservoir, retaining organic pollutants. Therefore, it is necessary to develop methods to determine insecticides in soil and monitor susceptible areas to be contaminated, applying adequate techniques to remediate them. Materials and methods This study investigates the occurrence of ten pyrethroid insecticides (PYs) and its spatio-temporal variance in soil at two different depths collected in two periods (before plow and during rice production), in a paddy field area located in the Mediterranean coast. Pyrethroids were quantified using gas chromatography?mass spectrometry (GC?MS) after ultrasound-assisted extraction with ethyl acetate. The results obtained were assessed statistically using non-parametric methods, and significant statistical differences (p < 0.05) in pyrethroids content with soil depth and proximity to wastewater treatment plants were evaluated. Moreover, a geographic information system (GIS) was used to monitor the occurrence of PYs in paddy fields and detect risk areas. Results and discussion Pyrethroids were detected at concentrations ?57.0 ng g?1 before plow and ?62.3 ng g?1 during rice production, being resmethrin and cyfluthrin the compounds found at higher concentrations in soil. Pyrethroids were detected mainly at the top soil, and a GIS program was used to depict the obtained results, showing that effluents from wastewater treatment plants (WWTPs) were the main sources of soil contamination. No toxic effects were expected to soil organisms, but it is of concern that PYs may affect aquatic organisms, which represents the worst case scenario. Conclusions A methodology to determine pyrethroids in soil was developed to monitor a paddy field area. The use of water fromWWTPs to irrigate rice fields is one of the main pollution sources of pyrethroids. It is a matter of concern that PYs may present toxic effects on aquatic organisms, as they can be desorbed from soil. Phytoremediation may play an important role in this area, reducing the possible risk associated to PYs levels in soil.
Resumo:
Hippocampal slices are used to show that, as a temporal input pattern of activity flows through a neuronal layer, a temporal-to-spatial transformation takes place. That is, neurons can respond selectively to the first or second of a pair of input pulses, thus transforming different temporal patterns of activity into the activity of different neurons. This is demonstrated using associative long-term potentiation of polysynaptic CA1 responses as an activity-dependent marker: by depolarizing a postsynaptic CA1 neuron exclusively with the first or second of a pair of pulses from the dentate gyrus, it is possible to “tag” different subpopulations of CA3 neurons. This technique allows sampling of a population of neurons without recording simultaneously from multiple neurons. Furthermore, it reflects a biologically plausible mechanism by which single neurons may develop selective responses to time-varying stimuli and permits the induction of context-sensitive synaptic plasticity. These experimental results support the view that networks of neurons are intrinsically able to process temporal information and that it is not necessary to invoke the existence of internal clocks or delay lines for temporal processing on the time scale of tens to hundreds of milliseconds.
Resumo:
Participation of two medial temporal lobe structures, the hippocampal region and the amygdala, in long-term declarative memory encoding was examined by using positron emission tomography of regional cerebral glucose. Positron emission tomography scanning was performed in eight healthy subjects listening passively to a repeated sequence of unrelated words. Memory for the words was assessed 24 hr later with an incidental free recall test. The percentage of words freely recalled then was correlated with glucose activity during encoding. The results revealed a striking correlation (r = 0.91, P < 0.001) between activity of the left hippocampal region (centered on the dorsal parahippocampal gyrus) and word recall. No correlation was found between activity of either the left or right amygdala and recall. The findings provide evidence for hippocampal involvement in long-term declarative memory encoding and for the view that the amygdala is not involved with declarative memory formation for nonemotional material.
Resumo:
Little is known about the specific functional contribution of the human orbitofrontal cortex with regard to memory processing, although there is strong evidence from lesion studies in monkeys that it may play an important role. The present investigation measured changes in regional cerebral blood flow with positron emission tomography in normal human subjects who were instructed to commit to memory abstract visual patterns. The results indicated that the rostral orbitofrontal region (area 11), which is primarily linked with the anterior medial temporal limbic region and lateral prefrontal cortical areas, is involved in the process of encoding of new information.
Resumo:
We describe experiments on behaving rats with electrodes implanted on the cornea, in the optic chiasm, and on the visual cortex; in addition, two red light-emitting diodes (LED) are permanently attached to the skull over the left eye. Recordings timelocked to the LED flashes reveal both the local events at each electrode site and the orderly transfer of visual information from retina to cortex. The major finding is that every stimulus, regardless of its luminance, duration, or the state of retinal light adaptation, elicits an optic nerve volley with a latency of about 10 ms and a duration of about 300 ms. This phenomenon has not been reported previously, so far as we are aware. We conclude that the retina, which originates from the forebrain of the developing embryo, behaves like a typical brain structure: it translates, within a few hundred milliseconds, the chemical information in each pattern of bleached photoreceptors into a corresponding pattern of ganglion cell neuronal information that leaves via the optic nerve. The attributes of each rat ganglion cell appear to include whether the retinal neuropile calls on it to leave after a stimulus and, if so when, within a 300-ms poststimulus epoch. The resulting retinal analysis of the scene, on arrival at the cortical level, is presumed to participate importantly in the creation of visual perceptual experiences.
Resumo:
Knowledge of the stage composition and the temporal dynamics of human cognitive operations is critical for building theories of higher mental activity. This information has been difficult to acquire, even with different combinations of techniques such as refined behavioral testing, electrical recording/interference, and metabolic imaging studies. Verbal object comprehension was studied herein in a single individual, by using three tasks (object naming, auditory word comprehension, and visual word comprehension), two languages (English and Farsi), and four techniques (stimulus manipulation, direct cortical electrical interference, electrocorticography, and a variation of the technique of direct cortical electrical interference to produce time-delimited effects, called timeslicing), in a subject in whom indwelling subdural electrode arrays had been placed for clinical purposes. Electrical interference at a pair of electrodes on the left lateral occipitotemporal gyrus interfered with naming in both languages and with comprehension in the language tested (English). The naming and comprehension deficit resulted from interference with processing of verbal object meaning. Electrocorticography indices of cortical activation at this site during naming started 250–300 msec after visual stimulus presentation. By using the timeslicing technique, which varies the onset of electrical interference relative to the behavioral task, we found that completion of processing for verbal object meaning varied from 450 to 750 msec after current onset. This variability was found to be a function of the subject’s familiarity with the objects.
Resumo:
To investigate the types of memory traces recovered by the medial temporal lobe (MTL), neural activity during veridical and illusory recognition was measured with the use of functional MRI (fMRI). Twelve healthy young adults watched a videotape segment in which two speakers alternatively presented lists of associated words, and then the subjects performed a recognition test including words presented in the study lists (True items), new words closely related to studied words (False items), and new unrelated words (New items). The main finding was a dissociation between two MTL regions: whereas the hippocampus was similarly activated for True and False items, suggesting the recovery of semantic information, the parahippocampal gyrus was more activated for True than for False items, suggesting the recovery of perceptual information. The study also yielded a dissociation between two prefrontal cortex (PFC) regions: whereas bilateral dorsolateral PFC was more activated for True and False items than for New items, possibly reflecting monitoring of retrieved information, left ventrolateral PFC was more activated for New than for True and False items, possibly reflecting semantic processing. Precuneus and lateral parietal regions were more activated for True and False than for New items. Orbitofrontal cortex and cerebellar regions were more activated for False than for True items. In conclusion, the results suggest that activity in anterior MTL regions does not distinguish True from False, whereas activity in posterior MTL regions does.
Resumo:
Spoken language is one of the most compact and structured ways to convey information. The linguistic ability to structure individual words into larger sentence units permits speakers to express a nearly unlimited range of meanings. This ability is rooted in speakers' knowledge of syntax and in the corresponding process of syntactic encoding. Syntactic encoding is highly automatized, operates largely outside of conscious awareness, and overlaps closely in time with several other processes of language production. With the use of positron emission tomography we investigated the cortical activations during spoken language production that are related to the syntactic encoding process. In the paradigm of restrictive scene description, utterances varying in complexity of syntactic encoding were elicited. Results provided evidence that the left Rolandic operculum, caudally adjacent to Broca's area, is involved in both sentence-level and local (phrase-level) syntactic encoding during speaking.
Resumo:
Rapid progress in effective methods to image brain functions has revolutionized neuroscience. It is now possible to study noninvasively in humans neural processes that were previously only accessible in experimental animals and in brain-injured patients. In this endeavor, positron emission tomography has been the leader, but the superconducting quantum interference device-based magnetoencephalography (MEG) is gaining a firm role, too. With the advent of instruments covering the whole scalp, MEG, typically with 5-mm spatial and 1-ms temporal resolution, allows neuroscientists to track cortical functions accurately in time and space. We present five representative examples of recent MEG studies in our laboratory that demonstrate the usefulness of whole-head magnetoencephalography in investigations of spatiotemporal dynamics of cortical signal processing.
Resumo:
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Resumo:
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.