964 resultados para Human Language Technologies
Resumo:
This article analyzes the appropriateness of a text summarization system, COMPENDIUM, for generating abstracts of biomedical papers. Two approaches are suggested: an extractive (COMPENDIUM E), which only selects and extracts the most relevant sentences of the documents, and an abstractive-oriented one (COMPENDIUM E–A), thus facing also the challenge of abstractive summarization. This novel strategy combines extractive information, with some pieces of information of the article that have been previously compressed or fused. Specifically, in this article, we want to study: i) whether COMPENDIUM produces good summaries in the biomedical domain; ii) which summarization approach is more suitable; and iii) the opinion of real users towards automatic summaries. Therefore, two types of evaluation were performed: quantitative and qualitative, for evaluating both the information contained in the summaries, as well as the user satisfaction. Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out. This also confirms the suitability of our suggested approach for generating summaries following an abstractive-oriented paradigm.
Resumo:
El Trastorno de Espectro Autista (TEA) es un trastorno que impide el correcto desarrollo de funciones cognitivas, habilidades sociales y comunicativas en las personas. Un porcentaje significativo de personas con autismo presentan además dificultades en la comprensión lectora. El proyecto europeo FIRST está orientado a desarrollar una herramienta multilingüe llamada Open Book que utiliza Tecnologías del Lenguaje Humano para identificar obstáculos que dificultan la comprensión lectora de un documento. La herramienta ayuda a cuidadores y personas con autismo transformando documentos escritos a un formato más sencillo mediante la eliminación de dichos obstáculos identificados en el texto. En este artículo se presenta el proyecto FIRST así como la herramienta desarrollada Open Book.
Resumo:
El proyecto ATTOS centra su actividad en el estudio y desarrollo de técnicas de análisis de opiniones, enfocado a proporcionar toda la información necesaria para que una empresa o una institución pueda tomar decisiones estratégicas en función a la imagen que la sociedad tiene sobre esa empresa, producto o servicio. El objetivo último del proyecto es la interpretación automática de estas opiniones, posibilitando así su posterior explotación. Para ello se estudian parámetros tales como la intensidad de la opinión, ubicación geográfica y perfil de usuario, entre otros factores, para facilitar la toma de decisiones. El objetivo general del proyecto se centra en el estudio, desarrollo y experimentación de técnicas, recursos y sistemas basados en Tecnologías del Lenguaje Humano (TLH), para conformar una plataforma de monitorización de la Web 2.0 que genere información sobre tendencias de opinión relacionadas con un tema.
Resumo:
La gran cantidad de información disponible en Internet está dificultando cada vez más que los usuarios puedan digerir toda esa información, siendo actualmente casi impensable sin la ayuda de herramientas basadas en las Tecnologías del Lenguaje Humano (TLH), como pueden ser los recuperadores de información o resumidores automáticos. El interés de este proyecto emergente (y por tanto, su objetivo principal) viene motivado precisamente por la necesidad de definir y crear un marco tecnológico basado en TLH, capaz de procesar y anotar semánticamente la información, así como permitir la generación de información de forma automática, flexibilizando el tipo de información a presentar y adaptándola a las necesidades de los usuarios. En este artículo se proporciona una visión general de este proyecto, centrándonos en la arquitectura propuesta y el estado actual del mismo.
Resumo:
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Resumo:
Numerous linguistic operations have been assigned to cortical brain areas, but the contributions of subcortical structures to human language processing are still being discussed. Using simultaneous EEG recordings directly from deep brain structures and the scalp, we show that the human thalamus systematically reacts to syntactic and semantic parameters of auditorily presented language in a temporally interleaved manner in coordination with cortical regions. In contrast, two key structures of the basal ganglia, the globus pallidus internus and the subthalamic nucleus, were not found to be engaged in these processes. We therefore propose that syntactic and semantic language analysis is primarily realized within cortico-thalamic networks, whereas a cohesive basal ganglia network is not involved in these essential operations of language analysis.
Resumo:
Quantitative linguistics has provided us with a number of empirical laws that characterise the evolution of languages and competition amongst them. In terms of language usage, one of the most influential results is Zipf’s law of word frequencies. Zipf’s law appears to be universal, and may not even be unique to human language. However, there is ongoing controversy over whether Zipf’s law is a good indicator of complexity. Here we present an alternative approach that puts Zipf’s law in the context of critical phenomena (the cornerstone of complexity in physics) and establishes the presence of a large-scale “attraction” between successive repetitions of words. Moreover, this phenomenon is scale-invariant and universal – the pattern is independent of word frequency and is observed in texts by different authors and written in different languages. There is evidence, however, that the shape of the scaling relation changes for words that play a key role in the text, implying the existence of different “universality classes” in the repetition of words. These behaviours exhibit striking parallels with complex catastrophic phenomena.
Resumo:
This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies. We adopt a set of tools that have been successfully used in the Bioinformatics field, they are adapted to the needs of our field and used to deploy web services, which can be combined to build more complex processing chains (workflows). This paper describes the platform and its different components (web services, registry, workflows, social network and interoperability). We demonstrate the scalability of the platform by carrying out a set of massive data experiments. Finally, a validation of the platform across a set of required criteria proves its usability for different types of users (non-technical users and providers).
Resumo:
According to the theory of language of the young Benjamin, the primary task of language isn't the communication of contents, but to express itself as a "spiritual essence" in which also men take part. That conception according to which language would be a medium to signification of something outside it leads to a necessary decrease of its original strength and is thus denominated by Benjamin bürgerlich. The names of human language are remainders of an archaic state, in which things weren't yet mute and had their own language. Benjamin suggests also that all the arts remind the original language of things, as they make objects "speak" in form of sounds, colors, shapes etc. That relationship between arts as reminders of the "language of things" and the possible reconciliation of mankind with itself and with nature has been developed by Theodor Adorno in several of his writings, specially in the Aesthetic Theory, where the artwork is ultimately conceived as a construct pervaded by "language" in the widest meaning - not in the "bourgeois" sense.
Resumo:
The human language-learning ability persists throughout life, indicating considerable flexibility at the cognitive and neural level. This ability spans from expanding the vocabulary in the mother tongue to acquisition of a new language with its lexicon and grammar. The present thesis consists of five studies that tap both of these aspects of adult language learning by using magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) during language processing and language learning tasks. The thesis shows that learning novel phonological word forms, either in the native tongue or when exposed to a foreign phonology, activates the brain in similar ways. The results also show that novel native words readily become integrated in the mental lexicon. Several studies in the thesis highlight the left temporal cortex as an important brain region in learning and accessing phonological forms. Incidental learning of foreign phonological word forms was reflected in functionally distinct temporal lobe areas that, respectively, reflected short-term memory processes and more stable learning that persisted to the next day. In a study where explicitly trained items were tracked for ten months, it was found that enhanced naming-related temporal and frontal activation one week after learning was predictive of good long-term memory. The results suggest that memory maintenance is an active process that depends on mechanisms of reconsolidation, and that these process vary considerably between individuals. The thesis put special emphasis on studying language learning in the context of language production. The neural foundation of language production has been studied considerably less than that of perceptive language, especially on the sentence level. A well-known paradigm in language production studies is picture naming, also used as a clinical tool in neuropsychology. This thesis shows that accessing the meaning and phonological form of a depicted object are subserved by different neural implementations. Moreover, a comparison between action and object naming from identical images indicated that the grammatical class of the retrieved word (verb, noun) is less important than the visual content of the image. In the present thesis, the picture naming was further modified into a novel paradigm in order to probe sentence-level speech production in a newly learned miniature language. Neural activity related to grammatical processing did not differ between the novel language and the mother tongue, but stronger neural activation for the novel language was observed during the planning of the upcoming output, likely related to more demanding lexical retrieval and short-term memory. In sum, the thesis aimed at examining language learning by combining different linguistic domains, such as phonology, semantics, and grammar, in a dynamic description of language processing in the human brain.
Resumo:
We suggest there is somewhat more potential than Christiansen & Chater (C&C) allow for genetic adaptations specific to language. Our uniquely cooperative social system requires sophisticated language skills. Learning and performance of some culturally transmitted elements in animals is genetically based, and we give examples of features of human language that evolve slowly enough that genetic adaptations to them may arise.
Resumo:
We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.
Resumo:
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.
Resumo:
Central to animal studies is the question of words and how they are used in relation to wordless beings such as non-human animals. This issue is addressed by the writer D.H. Lawrence, and the focus of this thesis is the linguistic vulnerability of humans and non-humans in his novel Women in Love, a subject that will be explored with the help of the philosopher Jacques Derrida’s text The Animal That Therefore I Am. The argument is that Women in Love illustrates the human subjection to and constitution in language, which both enables human thinking and restricts the human ability to think without words. This linguistic vulnerability causes a similar vulnerability in non-human animals in two ways. First, humans tend to imagine others, including non-verbal animals, through words, a medium they exist outside of and therefore cannot be defined through. Second, humans are often unperceptive of non-linguistic means of expression and they therefore do not discern what non-human animals may be trying to communicate to them, which often enables humans to justify abuse against non-humans. In addition, the novel shows how this shared but unequal vulnerability can sometimes be dissolved through the likewise shared but equal physical vulnerability of all animals if a human is able to imagine the experiences of a non-human animal through their shared embodiment rather than through human language. Hence the essay shows the importance of recognizing the limitations of language and of being aware of how the symbolizing effect of words influences the human treatment of its others.
Resumo:
AKT is a major research project applying a variety of technologies to knowledge management. Knowledge is a dynamic, ubiquitous resource, which is to be found equally in an expert's head, under terabytes of data, or explicitly stated in manuals. AKT will extend knowledge management technologies to exploit the potential of the semantic web, covering the use of knowledge over its entire lifecycle, from acquisition to maintenance and deletion. In this paper we discuss how HLT will be used in AKT and how the use of HLT will affect different areas of KM, such as knowledge acquisition, retrieval and publishing.