960 resultados para Word Sense Disambguaion, WSD, Natural Language Processing
Resumo:
[EU]Hizkuntzaren prozesamenduan testu koherenteetan kausa taldeko erlazioak (KAUSA, ONDORIOA eta HELBURUA) automatikoki hautematea eta bereiztea erabilgarria da galdera-erantzun automatikoko sistemak eraikitzerako orduan. Horretarako Egitura Erretorikoaren Teoria (Rhetorical Structure Theory, aurrerantzean RST) eta bere erlazioak erabiliko ditugu, corpus bezala RST Treebank -a (Iruskieta et al., 2013) hartuta, zientziako laburpen-testuz osatutako corpusa, hain zuzen ere. Corpus hori XML formatuan deskargatu eta hortik XPATH tresnaren bidez informazio garrantzitsuena eskuratzen dugu. Lan honek 3 helburu nagusi ditu: lehendabizi, kausa taldeko erlazioak elkarren artean bereiztea, bigarrenez, kausa taldeko erlazio hauek beste erlazio guztiekin bereiztea, eta azkenik, EBALUAZIOA eta INTERPRETAZIOA erlazioak bereiztea sentimendu analisian aplikatu ahal izateko. Ataza horiek egiteko, RhetDB tresnarekin eskuratu diren patroi ensaguratsuenak erabili eta bi aplikazio garatu ditugu. Alde batetik, bilatu nahi ditugun patroiak adierazi eta erlazio-egitura duen edonolako testuetan bilaketak egiten dituen bilatzailea, eta bestetik, patroi esanguratsuenak emanda erlazioak etiketatzen dituen etiketatzailea. Bi aplikazio hauek gainera, ahalik eta modu parametrizagarrienean erabiltzeko garatu ditugu, kodea aldatu gabe edonork erabili ahal izateko antzeko atazak egiteko. Etiketatzaileak ebaluatu ondoren, identifikatzeko erlaziorik errazena HELBURUA erlazioa dela ikusi dugu eta KAUSA eta ONDORIOA bereizteko arazo gehiago dauzkagula ere ondorioztatu dugu. Modu berean, EBALUAZIOA eta INTERPRETAZIOA ere elkarren artean bereiz dezakegula ikusi dugu.
Resumo:
Este Trabalho de Projeto tem como objetivo primordial analisar a tradução, de português para inglês, de textos económico-financeiros, utilizando a plataforma de Tradução Automática (TA) ISTRION. A tradução de conteúdos selecionados da Newsletter Económico-Financeira Maximus Report é efetuada com base na referida plataforma, complementada com outras ferramentas de apoio ao processamento linguístico que sejam consideradas relevantes. Visa-se igualmente com este Trabalho de Projeto analisar as potencialidades desta plataforma, bem como medir os resultados da tradução. Por último pretende-se enquadrar, testar, estudar e medir quais os critérios em que se poderá tornar mais eficiente a tradução destes textos.
Resumo:
In the first decades of 20th century the just instituted Brazilian Republic faced the challenge to modernize the country. Considering that the progress was associated with the exhaustion of the forest reserves and with climatic changes, two big issues were seen as fundamental: To Fight the Droughts and To Defend the Forests; headed by professionals who were dedicated to these ideals. This research starts from the premise that these were the main challenges enforced by nature to the Brazilian development; the general objective was delimited in the search to understand the meaning and the conception of the natural world by this group of professionals who faced the shock between modernizing the country and conserving its natural resources. Aiming to contribute with the construction of the Brazilian environmental history and to bring historical elements to the debate about the environment in the country, the author concentrates his attention to the analyses, the discussions and the actions that preceded the regulation on the use of natural resources and the implementation of the environmental legislation in Brazil, occurred in 1934. The investigation uses as methodological basis the theoretical directions of environmental history, using sources of data still little explored and valued. In such way, it is taken as starting point some published papers about this subject during the period between 1889 and 1934 in two technical magazines the Revista Brazil Ferro-Carril and the Revista do Club de Engenharia. National engineering played a basic role in this process while arguing, projecting and constructing the development. The formulated proposals, after being divulged, had fomented the interchange with other professionals and had favored the advance of ambient questions in Brazil, in the sense to preserve natural resources, to construct more harmonic relations between the society and the nature and to equate the development with the environment preservation
Resumo:
The necessity of the insertion of the capital of Rio Grande do Norte in the world-wide commercial scene and its claim as the seat of political power, in ends of nineteenth and beginning of twentieth century, determined the direction of urban interventions undertaken by government to restructure the city. In that matter, there were several actions of improvements and embellishment in Natal, which had, as a starting point, the adequacy works of the port, located in the Ribeira quarter, with the aim of ending the physical isolation that reinforced its economic stagnation. Besides the problems faced in the opening bar of the Potengi River, and would complement the required improvements, other barriers demonstrate the tension established between the physic-geographic field and the man: the flooded and slope which connected Cidade Alta and Ribeira the first two quarters of the city.The execution of these works demanded knowledge whose domain and application it was for engineering. But, how the actions done for the engineers, in sense to transform natural areas into constructed spaces made possible the intentional conformation of the quarter of the Ribeira in a commercial and politician-administrative center, in the middle of the XIX century and beginning of the XX? Understand, therefore, the employment effects of technology on the physical-geographical Ribeira, is the objective of this work that uses theoretical and methodological procedures of Urban Environmental History, by analyzing the relationship between the environment and the man, mediated by knowledge and use of technologies. The documental research was used, as primary sources, the Messages of the Provincial Assembly Government that later became the Legislative Assembly of Rio Grande do Norte reports and articles on specialized publications, in addition to local newspapers. The work is structured in five chapters. First, some comments about Urban Environmental History (Chapter 1) supplemented with analysis of the conceptual construction of nature in the Contemporary Era and its application in the city (chapter 02), the following chapters (03 and 04) deal with the rise of engineers as a active group in the Brazilian government frameworks and their vision about the nature inside the urban environment and it is studied how the professional technicians dealt with the improvement work of the harbor and in the shock with the natural forces. Other works that would complement this "project" of modernization and had had natural obstacles to be removed the Ribeira flood and slope constitute the subject of the fifth chapter. Finally, some final considerations retake the initial discussions aiming an association between the technique and the nature as junction elements inside the process of constitution of a Modern Natal
Resumo:
Ecological models written in a mathematical language L(M) or model language, with a given style or methodology can be considered as a text. It is possible to apply statistical linguistic laws and the experimental results demonstrate that the behaviour of a mathematical model is the same of any literary text of any natural language. A text has the following characteristics: (a) the variables, its transformed functions and parameters are the lexic units or LUN of ecological models; (b) the syllables are constituted by a LUN, or a chain of them, separated by operating or ordering LUNs; (c) the flow equations are words; and (d) the distribution of words (LUM and CLUN) according to their lengths is based on a Poisson distribution, the Chebanov's law. It is founded on Vakar's formula, that is calculated likewise the linguistic entropy for L(M). We will apply these ideas over practical examples using MARIOLA model. In this paper it will be studied the problem of the lengths of the simple lexic units composed lexic units and words of text models, expressing these lengths in number of the primitive symbols, and syllables. The use of these linguistic laws renders it possible to indicate the degree of information given by an ecological model.
Resumo:
There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness.^ Evidence-based patient-centered Brief Motivational Interviewing (BMI) interventions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary.^ Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems.^ To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].^
Resumo:
The present paper presents an application that composes formal poetry in Spanish in a semiautomatic interactive fashion. JASPER is a forward reasoning rule-based system that obtains from the user an intended message, the desired metric, a choice of vocabulary, and a corpus of verses; and, by intelligent adaptation of selected examples from this corpus using the given words, carries out a prose-to-poetry translation of the given message. In the composition process, JASPER combines natural language generation and a set of construction heuristics obtained from formal literature on Spanish poetry.
Resumo:
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Resumo:
This thesis is about young students’ writing in school mathematics and the ways in which this writing is designed, interpreted and understood. Students’ communication can act as a source from which teachers can make inferences regarding students’ mathematical knowledge and understanding. In mathematics education previous research indicates that teachers assume that the process of interpreting and judging students’ writing is unproblematic. The relationship between what students’ write, and what they know or understand, is theoretical as well as empirical. In an era of increased focus on assessment and measurement in education it is necessary for teachers to know more about the relationship between communication and achievement. To add to this knowledge, the thesis has adopted a broad approach, and the thesis consists of four studies. The aim of these studies is to reach a deep understanding of writing in school mathematics. Such an understanding is dependent on examining different aspects of writing. The four studies together examine how the concept of communication is described in authoritative texts, how students’ writing is viewed by teachers and how students make use of different communicational resources in their writing. The results of the four studies indicate that students’ writing is more complex than is acknowledged by teachers and authoritative texts in mathematics education. Results point to a sophistication in students’ approach to the merging of the two functions of writing, writing for oneself and writing for others. Results also suggest that students attend, to various extents, to questions regarding how, what and for whom they are writing in school mathematics. The relationship between writing and achievement is dependent on students’ ability to have their writing reflect their knowledge and on teachers’ thorough knowledge of the different features of writing and their awareness of its complexity. From a communicational perspective the ability to communicate [in writing] in mathematics can and should be distinguished from other mathematical abilities. By acknowledging that mathematical communication integrates mathematical language and natural language, teachers have an opportunity to turn writing in mathematics into an object of learning. This offers teachers the potential to add to their assessment literacy and offers students the potential to develop their communicational ability in order to write in a way that better reflects their mathematical knowledge.
Resumo:
O Reconhecimento de Entidades Mencionadas tem como objectivo identificar e classificar entidades, baseando-se em determinadas categorias ou etiquetas, contidas em textos escritos em linguagem natural. O Sistema de Reconhecimento de Entidades Mencionadas implementado na elaboração desta Dissertação pretende identificar localidades presentes em textos informais e definir para cada localidade identificada uma das etiquetas “aldeia", "vila" ou “cidade" numa primeira aproximação ao problema. Numa segunda aproximação tiveram-se em conta as etiquetas "freguesia", "concelho" e "distrito". Para a obtenção das classificações das entidades procedeu-se a uma análise estatística do número de resultados obtidos numa pesquisa de uma entidade precedida por uma etiqueta usando o motor de pesquisa Google Search. ABSTRACT: Named Entitity Recognition has the objective of identifying and classifying entities, according to certain categories or labels, contained in texts written in natural language. The Named Entitity Recognition system implemented in the developing of this dissertation intends to identify localities in informal texts, setting for each one of these localities identified one of the labels "aldeia", ''vila" or "cidade" in a first approach to the problem. ln a second approach the labels "freguesia", "concelho" and "distrito" were taken in consideration. To obtain classifications for the entities a statistical analysis of the number of results returned by a search of an entity preceded by a label using Google search engine was performed.
Resumo:
As descrições de produtos turísticos na área da hotelaria, aviação, rent-a-car e pacotes de férias baseiam-se sobretudo em descrições textuais em língua natural muito heterogénea com estilos, apresentações e conteúdos muito diferentes entre si. Uma vez que o sector do turismo é bastante dinâmico e que os seus produtos e ofertas estão constantemente em alteração, o tratamento manual de normalização de toda essa informação não é possível. Neste trabalho construiu-se um protótipo que permite a classificação e extracção automática de informação a partir de descrições de produtos de turismo. Inicialmente a informação é classificada quanto ao tipo. Seguidamente são extraídos os elementos relevantes de cada tipo e gerados objectos facilmente computáveis. Sobre os objectos extraídos, o protótipo com recurso a modelos de textos e imagens gera automaticamente descrições normalizadas e orientadas a um determinado mercado. Esta versatilidade permite um novo conjunto de serviços na promoção e venda dos produtos que seria impossível implementar com a informação original. Este protótipo, embora possa ser aplicado a outros domínios, foi avaliado na normalização da descrição de hotéis. As frases descritivas do hotel são classificadas consoante o seu tipo (Local, Serviços e/ou Equipamento) através de um algoritmo de aprendizagem automática que obtém valores médios de cobertura de 96% e precisão de 72%. A cobertura foi considerada a medida mais importante uma vez que a sua maximização permite que não se percam frases para processamentos posteriores. Este trabalho permitiu também a construção e população de uma base de dados de hotéis que possibilita a pesquisa de hotéis pelas suas características. Esta funcionalidade não seria possível utilizando os conteúdos originais. ABSTRACT: The description of tourism products, like hotel, aviation, rent-a-car and holiday packages, is strongly supported on natural language expressions. Due to the extent of tourism offers and considering the high dynamics in the tourism sector, manual data management is not a reliable or scalable solution. Offer descriptions - in the order of thousands - are structured in different ways, possibly comprising different languages, complementing and/or overlap one another. This work aims at creating a prototype for the automatic classification and extraction of relevant knowledge from tourism-related text expressions. Captured knowledge is represented in a normalized/standard format to enable new services based on this information in order to promote and sale tourism products that would be impossible to implement with the raw information. Although it could be applied to other areas, this prototype was evaluated in the normalization of hotel descriptions. Hotels descriptive sentences are classified according their type (Location, Services and/or Equipment) using a machine learning algorithm. The built setting obtained an average recall of 96% and precision of 72%. Recall considered the most important measure of performance since its maximization allows that sentences were not lost in further processes. As a side product a database of hotels was built and populated with search facilities on its characteristics. This ability would not be possible using the original contents.
Resumo:
Le malattie rare pongono diversi scogli ai pazienti, ai loro familiari e ai sanitari. Uno fra questi è la mancanza di informazione che deriva dall'assenza di fonti sicure e semplici da consultare su aspetti dell'esperienza del paziente. Il lavoro presentato ha lo scopo di generare da set termini correlati semanticamente, delle frasi che abbiamo la capacità di spiegare il legame fra di essi e aggiungere informazioni utili e veritiere in un linguaggio semplice e comprensibile. Il problema affrontato oggigiorno non è ben documentato in letteratura e rappresenta una sfida interessante si per complessità che per mancanza di dataset per l'addestramento. Questo tipo di task, come altri di NLP, è affrontabile solo con modelli sempre più potenti ma che richiedono risorse sempre più elevate. Per questo motivo, è stato utilizzato il meccanismo di recente pubblicazione del Performer, dimostrando di riuscire a mantenere uno stesso grado di accuratezza e di qualità delle frasi prodotte, con una parallela riduzione delle risorse utilizzate. Ciò apre la strada all'utilizzo delle reti neurali più recenti anche senza avere i centri di calcolo delle multinazionali. Il modello proposto dunque è in grado di generare frasi che illustrano le relazioni semantiche di termini estratti da un mole di documenti testuali, permettendo di generare dei riassunti dell'informazione e della conoscenza estratta da essi e renderla facilmente accessibile e comprensibile al pazienti o a persone non esperte.
Resumo:
Twitter is a highly popular social media which on one hand allows information transmission in real time and on the other hand represents a source of open access homogeneous text data. We propose an analysis of the most common self-reported COVID symptoms from a dataset of Italian tweets to investigate the evolution of the pandemic in Italy from the end of September 2020 to the end of January 2021. After manually filtering tweets actually describing COVID symptoms from the database - which contains words related to fever, cough and sore throat - we discuss usefulness of such filtering. We then compare our time series with the daily data of new hospitalisations in Italy, with the aim of building a simple linear regression model that accounts for the delay which is observed from the tweets mentioning individual symptoms to new hospitalisations. We discuss both the results and limitations of linear regression given that our data suggests that the relationship between time series of symptoms tweets and of new hospitalisations changes towards the end of the acquisition.
Resumo:
This Thesis is composed of a collection of works written in the period 2019-2022, whose aim is to find methodologies of Artificial Intelligence (AI) and Machine Learning to detect and classify patterns and rules in argumentative and legal texts. We define our approach “hybrid”, since we aimed at designing hybrid combinations of symbolic and sub-symbolic AI, involving both “top-down” structured knowledge and “bottom-up” data-driven knowledge. A first group of works is dedicated to the classification of argumentative patterns. Following the Waltonian model of argument and the related theory of Argumentation Schemes, these works focused on the detection of argumentative support and opposition, showing that argumentative evidences can be classified at fine-grained levels without resorting to highly engineered features. To show this, our methods involved not only traditional approaches such as TFIDF, but also some novel methods based on Tree Kernel algorithms. After the encouraging results of this first phase, we explored the use of a some emerging methodologies promoted by actors like Google, which have deeply changed NLP since 2018-19 — i.e., Transfer Learning and language models. These new methodologies markedly improved our previous results, providing us with best-performing NLP tools. Using Transfer Learning, we also performed a Sequence Labelling task to recognize the exact span of argumentative components (i.e., claims and premises), thus connecting portions of natural language to portions of arguments (i.e., to the logical-inferential dimension). The last part of our work was finally dedicated to the employment of Transfer Learning methods for the detection of rules and deontic modalities. In this case, we explored a hybrid approach which combines structured knowledge coming from two LegalXML formats (i.e., Akoma Ntoso and LegalRuleML) with sub-symbolic knowledge coming from pre-trained (and then fine-tuned) neural architectures.
Resumo:
In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.