979 resultados para Natural language


Relevância:

70.00% 70.00%

Publicador:

Resumo:

El campo de procesamiento de lenguaje natural (PLN), ha tenido un gran crecimiento en los últimos años; sus áreas de investigación incluyen: recuperación y extracción de información, minería de datos, traducción automática, sistemas de búsquedas de respuestas, generación de resúmenes automáticos, análisis de sentimientos, entre otras. En este artículo se presentan conceptos y algunas herramientas con el fin de contribuir al entendimiento del procesamiento de texto con técnicas de PLN, con el propósito de extraer información relevante que pueda ser usada en un gran rango de aplicaciones. Se pueden desarrollar clasificadores automáticos que permitan categorizar documentos y recomendar etiquetas; estos clasificadores deben ser independientes de la plataforma, fácilmente personalizables para poder ser integrados en diferentes proyectos y que sean capaces de aprender a partir de ejemplos. En el presente artículo se introducen estos algoritmos de clasificación, se analizan algunas herramientas de código abierto disponibles actualmente para llevar a cabo estas tareas y se comparan diversas implementaciones utilizando la métrica F en la evaluación de los clasificadores.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Este artículo presenta la aplicación y resultados obtenidos de la investigación en técnicas de procesamiento de lenguaje natural y tecnología semántica en Brand Rain y Anpro21. Se exponen todos los proyectos relacionados con las temáticas antes mencionadas y se presenta la aplicación y ventajas de la transferencia de la investigación y nuevas tecnologías desarrolladas a la herramienta de monitorización y cálculo de reputación Brand Rain.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

It is possible to view the relations between mathematics and natural language from different aspects. This relation between mathematics and language is not based on just one aspect. In this article, the authors address the role of the Subject facing Reality through language. Perception is defined and a mathematical theory of the perceptual field is proposed. The distinction between purely expressive language and purely informative language is considered false, because the subject is expressed in the communication of a message, and conversely, in purely expressive language, as in an exclamation, there is some information. To study the relation between language and reality, the function of ostensibility is defined and propositions are divided into ostensives and estimatives.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Information can be expressed in many ways according to the different capacities of humans to perceive it. Current systems deals with multimedia, multiformat and multiplatform systems but another « multi » is still pending to guarantee global access to information, that is, multilinguality. Different languages imply different replications of the systems according to the language in question. No solutions appear to represent the bridge between the human representation (natural language) and a system-oriented representation. The United Nations University defined in 1997 a language to be the support of effective multilinguism in Internet. In this paper, we describe this language and its possible applications beyond multilingual services as the possible future standard for different language independent applications.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

* This paper was made according to the program of fundamental scientific research of the Presidium of the Russian Academy of Sciences «Mathematical simulation and intellectual systems», the project "Theoretical foundation of the intellectual systems based on ontologies for intellectual support of scientific researches".

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Applied problems of functional homonymy resolution for Russian language are investigated in the work. The results obtained while using the method of functional homonymy resolution based on contextual rules are presented. Structural characteristics of minimal contextual rules for different types of functional homonymy are researched. Particular attention is paid to studying the control structure of the rules, which allows for the homonymy resolution accuracy not less than 95%. The contextual rules constructed have been realized in the system of technical text analysis.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures. This bilingual corpus will be widely applicable to the contrastive studies of the both Slavic languages, will also be useful resource for language engineering research and development, especially in machine translation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The current study is a post-hoc analysis of data from the original randomized control trial of the Play and Language for Autistic Youngsters (PLAY) Home Consultation program, a parent-mediated, DIR/Floortime based early intervention program for children with ASD (Solomon, Van Egeren, Mahone, Huber, & Zimmerman, 2014). We examined 22 children from the original RCT who received the PLAY program. Children were split into two groups (high and lower functioning) based on the ADOS module administered prior to intervention. Fifteen-minute parent-child video sessions were coded through the use of CHILDES transcription software. Child and maternal language, communicative behaviors, and communicative functions were assessed in the natural language samples both pre- and post-intervention. Results demonstrated significant improvements in both child and maternal behaviors following intervention. There was a significant increase in child verbal and non-verbal initiations and verbal responses in whole group analysis. Total number of utterances, word production, and grammatical complexity all significantly improved when viewed across the whole group of participants; however, lexical growth did not reach significance. Changes in child communicative function were especially noteworthy, and demonstrated a significant increase in social interaction and a significant decrease in non-interactive behaviors. Further, mothers demonstrated an increase in responsiveness to the child’s conversational bids, increased ability to follow the child’s lead, and a decrease in directiveness. When separated for analyses within groups, trends emerged for child and maternal variables, suggesting greater gains in use of communicative function in both high and low groups over changes in linguistic structure. Additional analysis also revealed a significant inverse relationship between maternal responsiveness and child non-interactive behaviors; as mothers became more responsive, children’s non-engagement was decreased. Such changes further suggest that changes in learned skills following PLAY parent training may result in improvements in child social interaction and language abilities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

O processamento de linguagem natural e as ontologias são ferramentas cuja interação permite uma melhor compreensão dos dados armazenados. Este trabalho, ao associar estas duas áreas aos elementos disponíveis numa base de dados prosopográfica, tornou possível identificar e classificar relacionamentos entre setores de ocupação na forma como eram designados na época, setores de atividade num formato mais próximo do de hoje e o estatuto social que essas incumbências tinham na sociedade coeva. Os dados utilizados são sobretudo de membros do Santo Ofício – do século XVI ao século XVIII. Para atingir este objetivo utilizaram-se algumas descrições textuais de ocorrências da época e outras pouco estruturadas, disponíveis no repositório SPARES. A aplicação de processamento de linguagem natural (remoção de stopwords e aplicação de stemming), conjugada com a construção de duas ontologias, tornou possível classificar esses dados, permitindo consultas mais eficazes. Ao contribuir para a classificação automática de dados históricos, propõem-se metodologias que podem ser aplicadas em dados de qualquer outra área do conhecimento, especialmente as que lidam com as variáveis de tempo e espaço de forma mais intensa; Abstract: OntoSPARES: from natural language to ontologies Contributions to the automatic classification of historical data (16th-18th centuries) The interaction between the natural language processing and ontologies are tools allowing a better understanding of the data stored. This work, by combining these two areas to the elements available in a prosopographic database, has made possible to identify and classify relationships between occupations of many individuals (in general Holy Office members of the 16th-18th centuries). To achieve this goal the data used was gathered in SPARES repository, including some textual descriptions of the time occurrences. They are all few structured. The application of natural language processing (stopwords removal and stemming application), combined with the construction of two ontologies, made possible to classify those data, allowing a more effective search. By contributing to the automatic classification of historical data, this thesis proposes methodologies that can be applied to data from any other field of knowledge, specially data dealing with time and space variables.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nonostante lo scetticismo di molti studiosi circa la possibilità di prevedere l'andamento della borsa valori, esistono svariate teorie ipotizzanti la possibilità di utilizzare le informazioni conosciute per predirne i movimenti futuri. L’avvento dell’intelligenza artificiale nella seconda parte dello scorso secolo ha permesso di ottenere risultati rivoluzionari in svariati ambiti, tanto che oggi tale disciplina trova ampio impiego nella nostra vita quotidiana in molteplici forme. In particolare, grazie al machine learning, è stato possibile sviluppare sistemi intelligenti che apprendono grazie ai dati, riuscendo a modellare problemi complessi. Visto il successo di questi sistemi, essi sono stati applicati anche all’arduo compito di predire la borsa valori, dapprima utilizzando i dati storici finanziari della borsa come fonte di conoscenza, e poi, con la messa a punto di tecniche di elaborazione del linguaggio naturale umano (NLP), anche utilizzando dati in linguaggio naturale, come il testo di notizie finanziarie o l’opinione degli investitori. Questo elaborato ha l’obiettivo di fornire una panoramica sull’utilizzo delle tecniche di machine learning nel campo della predizione del mercato azionario, partendo dalle tecniche più elementari per arrivare ai complessi modelli neurali che oggi rappresentano lo stato dell’arte. Vengono inoltre formalizzati il funzionamento e le tecniche che si utilizzano per addestrare e valutare i modelli di machine learning, per poi effettuare un esperimento in cui a partire da dati finanziari e soprattutto testuali si tenterà di predire correttamente la variazione del valore dell’indice di borsa S&P 500 utilizzando un language model basato su una rete neurale.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transformer architectures achieved impressive results in almost any NLP task, such as Text Classification, Machine Translation, and Language Generation. As time went by, transformers continued to improve thanks to larger corpora and bigger networks, reaching hundreds of billions of parameters. Training and deploying such large models has become prohibitively expensive, such that only big high tech companies can afford to train those models. Therefore, a lot of research has been dedicated to reducing a model’s size. In this thesis, we investigate the effects of Vocabulary Transfer and Knowledge Distillation for compressing large Language Models. The goal is to combine these two methodologies to further compress models without significant loss of performance. In particular, we designed different combination strategies and conducted a series of experiments on different vertical domains (medical, legal, news) and downstream tasks (Text Classification and Named Entity Recognition). Four different methods involving Vocabulary Transfer (VIPI) with and without a Masked Language Modelling (MLM) step and with and without Knowledge Distillation are compared against a baseline that assigns random vectors to new elements of the vocabulary. Results indicate that VIPI effectively transfers information of the original vocabulary and that MLM is beneficial. It is also noted that both vocabulary transfer and knowledge distillation are orthogonal to one another and may be applied jointly. The application of knowledge distillation first before subsequently applying vocabulary transfer is recommended. Finally, model performance due to vocabulary transfer does not always show a consistent trend as the vocabulary size is reduced. Hence, the choice of vocabulary size should be empirically selected by evaluation on the downstream task similar to hyperparameter tuning.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nowadays the idea of injecting world or domain-specific structured knowledge into pre-trained language models (PLMs) is becoming an increasingly popular approach for solving problems such as biases, hallucinations, huge architectural sizes, and explainability lack—critical for real-world natural language processing applications in sensitive fields like bioinformatics. One recent work that has garnered much attention in Neuro-symbolic AI is QA-GNN, an end-to-end model for multiple-choice open-domain question answering (MCOQA) tasks via interpretable text-graph reasoning. Unlike previous publications, QA-GNN mutually informs PLMs and graph neural networks (GNNs) on top of relevant facts retrieved from knowledge graphs (KGs). However, taking a more holistic view, existing PLM+KG contributions mainly consider commonsense benchmarks and ignore or shallowly analyze performances on biomedical datasets. This thesis start from a propose of a deep investigation of QA-GNN for biomedicine, comparing existing or brand-new PLMs, KGs, edge-aware GNNs, preprocessing techniques, and initialization strategies. By combining the insights emerged in DISI's research, we introduce Bio-QA-GNN that include a KG. Working with this part has led to an improvement in state-of-the-art of MCOQA model on biomedical/clinical text, largely outperforming the original one (+3.63\% accuracy on MedQA). Our findings also contribute to a better understanding of the explanation degree allowed by joint text-graph reasoning architectures and their effectiveness on different medical subjects and reasoning types. Codes, models, datasets, and demos to reproduce the results are freely available at: \url{https://github.com/disi-unibo-nlp/bio-qagnn}.