954 resultados para Natural language processing systems
Resumo:
Wydział Anglistyki
Resumo:
A computer vision system that has to interact in natural language needs to understand the visual appearance of interactions between objects along with the appearance of objects themselves. Relationships between objects are frequently mentioned in queries of tasks like semantic image retrieval, image captioning, visual question answering and natural language object detection. Hence, it is essential to model context between objects for solving these tasks. In the first part of this thesis, we present a technique for detecting an object mentioned in a natural language query. Specifically, we work with referring expressions which are sentences that identify a particular object instance in an image. In many referring expressions, an object is described in relation to another object using prepositions, comparative adjectives, action verbs etc. Our proposed technique can identify both the referred object and the context object mentioned in such expressions. Context is also useful for incrementally understanding scenes and videos. In the second part of this thesis, we propose techniques for searching for objects in an image and events in a video. Our proposed incremental algorithms use the context from previously explored regions to prioritize the regions to explore next. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. Our first proposed technique shows how to learn context in indoor scenes in an implicit manner and use it for searching for objects. The second technique shows how explicitly written context rules of one-on-one basketball can be used to sequentially detect events in a game.
Resumo:
Este Trabalho de Projeto tem como objetivo primordial analisar a tradução, de português para inglês, de textos económico-financeiros, utilizando a plataforma de Tradução Automática (TA) ISTRION. A tradução de conteúdos selecionados da Newsletter Económico-Financeira Maximus Report é efetuada com base na referida plataforma, complementada com outras ferramentas de apoio ao processamento linguístico que sejam consideradas relevantes. Visa-se igualmente com este Trabalho de Projeto analisar as potencialidades desta plataforma, bem como medir os resultados da tradução. Por último pretende-se enquadrar, testar, estudar e medir quais os critérios em que se poderá tornar mais eficiente a tradução destes textos.
Resumo:
Tese (doutorado)—Universidade de Brasília, Centro de Desenvolvimento Sustentável, Programa de Pós-Graduação em Desenvolvimento Sustentável, 2016.
Resumo:
Allelopathy determines the dynamics of plant species in different environments. Understanding this biological phenomenon could help to develop applications in both natural and agricultural systems. This review summarizes the genetic and environmental characteristics that control the production and release of allelochemicals in agroecosystems. This study highlights the current understanding of the environmental changes caused by allelochemicals and summarizes the knowledge about the mechanisms of action of these compounds. Finally, it reviews novel applications of allelopathy in agricultural production systems, including the role of allelochemicals in consortia and their potential use in no-tillage cropping systems through cover crops or mulches.
Resumo:
The development cost of any civil infrastructure is very high; during its life span, the civil structure undergoes a lot of physical loads and environmental effects which damage the structure. Failing to identify this damage at an early stage may result in severe property loss and may become a potential threat to people and the environment. Thus, there is a need to develop effective damage detection techniques to ensure the safety and integrity of the structure. One of the Structural Health Monitoring methods to evaluate a structure is by using statistical analysis. In this study, a civil structure measuring 8 feet in length, 3 feet in diameter, embedded with thermocouple sensors at 4 different levels is analyzed under controlled and variable conditions. With the help of statistical analysis, possible damage to the structure was analyzed. The analysis could detect the structural defects at various levels of the structure.
Resumo:
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Resumo:
This thesis is about young students’ writing in school mathematics and the ways in which this writing is designed, interpreted and understood. Students’ communication can act as a source from which teachers can make inferences regarding students’ mathematical knowledge and understanding. In mathematics education previous research indicates that teachers assume that the process of interpreting and judging students’ writing is unproblematic. The relationship between what students’ write, and what they know or understand, is theoretical as well as empirical. In an era of increased focus on assessment and measurement in education it is necessary for teachers to know more about the relationship between communication and achievement. To add to this knowledge, the thesis has adopted a broad approach, and the thesis consists of four studies. The aim of these studies is to reach a deep understanding of writing in school mathematics. Such an understanding is dependent on examining different aspects of writing. The four studies together examine how the concept of communication is described in authoritative texts, how students’ writing is viewed by teachers and how students make use of different communicational resources in their writing. The results of the four studies indicate that students’ writing is more complex than is acknowledged by teachers and authoritative texts in mathematics education. Results point to a sophistication in students’ approach to the merging of the two functions of writing, writing for oneself and writing for others. Results also suggest that students attend, to various extents, to questions regarding how, what and for whom they are writing in school mathematics. The relationship between writing and achievement is dependent on students’ ability to have their writing reflect their knowledge and on teachers’ thorough knowledge of the different features of writing and their awareness of its complexity. From a communicational perspective the ability to communicate [in writing] in mathematics can and should be distinguished from other mathematical abilities. By acknowledging that mathematical communication integrates mathematical language and natural language, teachers have an opportunity to turn writing in mathematics into an object of learning. This offers teachers the potential to add to their assessment literacy and offers students the potential to develop their communicational ability in order to write in a way that better reflects their mathematical knowledge.
Resumo:
Kenia liegt in den Äquatorialtropen von Ostafrika und ist als ein weltweiter Hot-Spot für Aflatoxinbelastung insbesondere bei Mais bekannt. Diese toxischen und karzinogenen Verbindungen sind Stoffwechselprodukte von Pilzen und so insbesondere von der Wasseraktivität abhängig. Diese beeinflusst sowohl die Trocknung als auch die Lagerfähigkeit von Nahrungsmitteln und ist somit ein wichtiger Faktor bei der Entwicklung von energieeffizienten und qualitätsorientierten Verarbeitungsprozessen. Die vorliegende Arbeit hat sich zum Ziel gesetzt, die Veränderung der Wasseraktivität während der konvektiven Trocknung von Mais zu untersuchen. Mittels einer Optimierungssoftware (MS Excel Solver) wurde basierend auf sensorerfassten thermo-hygrometrischen Daten der gravimetrische Feuchteverlust von Maiskolben bei 37°C, 43°C und 53°C vorausberechnet. Dieser Bereich stellt den Übergang zwischen Niedrig- und Hochtemperaturtrocknung dar. Die Ergebnisse zeigen deutliche Unterschiede im Verhalten der Körner und der Spindel. Die Trocknung im Bereich von 35°C bis 45°C kombiniert mit hohen Strömungsgeschwindigkeiten (> 1,5 m / s) begünstigte die Trocknung der Körner gegenüber der Spindel und kann daher für eine energieeffiziente Trocknung von Kolben mit hohem Anfangsfeuchtegehalt empfohlen werden. Weitere Untersuchungen wurden zum Verhalten unterschiedlicher Schüttungen bei der bei Mais üblichen Satztrocknung durchgeführt. Entlieschter und gedroschener Mais führte zu einem vergrößerten Luftwiderstand in der Schüttung und sowohl zu einem höheren Energiebedarf als auch zu ungleichmäßigerer Trocknung, was nur durch einen erhöhten technischen Aufwand etwa durch Mischeinrichtungen oder Luftumkehr behoben werden könnte. Aufgrund des geringeren Aufwandes für die Belüftung und die Kontrolle kann für kleine landwirtschaftliche Praxisbetriebe in Kenia daher insbesondere die Trocknung ganzer Kolben in ungestörten Schüttungen empfohlen werden. Weiterhin wurde in der Arbeit die Entfeuchtung mittels eines Trockenmittels (Silikagel) kombiniert mit einer Heizquelle und abgegrenztem Luftvolumen untersucht und der konventionellen Trocknung gegenüber gestellt. Die Ergebnisse zeigten vergleichbare Entfeuchtungsraten während der ersten 5 Stunden der Trocknung. Der jeweilige Luftzustand bei Verwendung von Silikagel wurde insbesondere durch das eingeschlossene Luftvolumen und die Temperatur beeinflusst. Granulierte Trockenmittel sind bei der Maistrocknung unter hygienischen Gesichtspunkten vorteilhaft und können beispielsweise mit einfachen Öfen regeneriert werden, so dass Qualitätsbeeinträchtigungen wie bei Hochtemperatur- oder auch Freilufttrocknung vermieden werden können. Eine hochwertige Maistrocknungstechnik ist sehr kapitalintensiv. Aus der vorliegenden Arbeit kann aber abgeleitet werden, dass einfache Verbesserungen wie eine sensorgestützte Belüftung von Satztrocknern, der Einsatz von Trockenmitteln und eine angepasste Schüttungshöhe praktikable Lösungen für Kleinbauern in Kenia sein können. Hierzu besteht, ggf. auch zum Aspekt der Verwendung regenerativer Energien, weiterer Forschungsbedarf.
Resumo:
O Reconhecimento de Entidades Mencionadas tem como objectivo identificar e classificar entidades, baseando-se em determinadas categorias ou etiquetas, contidas em textos escritos em linguagem natural. O Sistema de Reconhecimento de Entidades Mencionadas implementado na elaboração desta Dissertação pretende identificar localidades presentes em textos informais e definir para cada localidade identificada uma das etiquetas “aldeia", "vila" ou “cidade" numa primeira aproximação ao problema. Numa segunda aproximação tiveram-se em conta as etiquetas "freguesia", "concelho" e "distrito". Para a obtenção das classificações das entidades procedeu-se a uma análise estatística do número de resultados obtidos numa pesquisa de uma entidade precedida por uma etiqueta usando o motor de pesquisa Google Search. ABSTRACT: Named Entitity Recognition has the objective of identifying and classifying entities, according to certain categories or labels, contained in texts written in natural language. The Named Entitity Recognition system implemented in the developing of this dissertation intends to identify localities in informal texts, setting for each one of these localities identified one of the labels "aldeia", ''vila" or "cidade" in a first approach to the problem. ln a second approach the labels "freguesia", "concelho" and "distrito" were taken in consideration. To obtain classifications for the entities a statistical analysis of the number of results returned by a search of an entity preceded by a label using Google search engine was performed.
Resumo:
Bangla OCR (Optical Character Recognition) is a long deserving software for Bengali community all over the world. Numerous e efforts suggest that due to the inherent complex nature of Bangla alphabet and its word formation process development of high fidelity OCR producing a reasonably acceptable output still remains a challenge. One possible way of improvement is by using post processing of OCR’s output; algorithms such as Edit Distance and the use of n-grams statistical information have been used to rectify misspelled words in language processing. This work presents the first known approach to use these algorithms to replace misrecognized words produced by Bangla OCR. The assessment is made on a set of fifty documents written in Bangla script and uses a dictionary of 541,167 words. The proposed correction model can correct several words lowering the recognition error rate by 2.87% and 3.18% for the character based n- gram and edit distance algorithms respectively. The developed system suggests a list of 5 (five) alternatives for a misspelled word. It is found that in 33.82% cases, the correct word is the topmost suggestion of 5 words list for n-gram algorithm while using Edit distance algorithm the first word in the suggestion properly matches 36.31% of the cases. This work will ignite rooms of thoughts for possible improvements in character recognition endeavour.
Resumo:
As descrições de produtos turísticos na área da hotelaria, aviação, rent-a-car e pacotes de férias baseiam-se sobretudo em descrições textuais em língua natural muito heterogénea com estilos, apresentações e conteúdos muito diferentes entre si. Uma vez que o sector do turismo é bastante dinâmico e que os seus produtos e ofertas estão constantemente em alteração, o tratamento manual de normalização de toda essa informação não é possível. Neste trabalho construiu-se um protótipo que permite a classificação e extracção automática de informação a partir de descrições de produtos de turismo. Inicialmente a informação é classificada quanto ao tipo. Seguidamente são extraídos os elementos relevantes de cada tipo e gerados objectos facilmente computáveis. Sobre os objectos extraídos, o protótipo com recurso a modelos de textos e imagens gera automaticamente descrições normalizadas e orientadas a um determinado mercado. Esta versatilidade permite um novo conjunto de serviços na promoção e venda dos produtos que seria impossível implementar com a informação original. Este protótipo, embora possa ser aplicado a outros domínios, foi avaliado na normalização da descrição de hotéis. As frases descritivas do hotel são classificadas consoante o seu tipo (Local, Serviços e/ou Equipamento) através de um algoritmo de aprendizagem automática que obtém valores médios de cobertura de 96% e precisão de 72%. A cobertura foi considerada a medida mais importante uma vez que a sua maximização permite que não se percam frases para processamentos posteriores. Este trabalho permitiu também a construção e população de uma base de dados de hotéis que possibilita a pesquisa de hotéis pelas suas características. Esta funcionalidade não seria possível utilizando os conteúdos originais. ABSTRACT: The description of tourism products, like hotel, aviation, rent-a-car and holiday packages, is strongly supported on natural language expressions. Due to the extent of tourism offers and considering the high dynamics in the tourism sector, manual data management is not a reliable or scalable solution. Offer descriptions - in the order of thousands - are structured in different ways, possibly comprising different languages, complementing and/or overlap one another. This work aims at creating a prototype for the automatic classification and extraction of relevant knowledge from tourism-related text expressions. Captured knowledge is represented in a normalized/standard format to enable new services based on this information in order to promote and sale tourism products that would be impossible to implement with the raw information. Although it could be applied to other areas, this prototype was evaluated in the normalization of hotel descriptions. Hotels descriptive sentences are classified according their type (Location, Services and/or Equipment) using a machine learning algorithm. The built setting obtained an average recall of 96% and precision of 72%. Recall considered the most important measure of performance since its maximization allows that sentences were not lost in further processes. As a side product a database of hotels was built and populated with search facilities on its characteristics. This ability would not be possible using the original contents.
Resumo:
ResumenEn este trabajo se ofrece una comparación inicial de los modelos tecnológicos explícitos e implícitos propuestos o supuestos por los autores de una serie de tratados generales, manuales y estudios sobre caficultura, tanto europeos o estadounidenses como caribeños y latinoamericanos. También se contrastan las recomendaciones técnicas como las diversas prácticas de cultivo y procesamiento que se traslucen a través de las páginas de dichos textos.AbstractThe study offers an initial comparison of the explicit and implicit technological models suggested or assumed by the author of a series of general treatises, manuals and studies on coffee, both by Europeans or North Americans and by Caribbeans or Latin Americans. Technical recommendations are also contrasted with the various agricultural and processing systems actually in use, as reflected in the pages of those same texts.
Resumo:
Le malattie rare pongono diversi scogli ai pazienti, ai loro familiari e ai sanitari. Uno fra questi è la mancanza di informazione che deriva dall'assenza di fonti sicure e semplici da consultare su aspetti dell'esperienza del paziente. Il lavoro presentato ha lo scopo di generare da set termini correlati semanticamente, delle frasi che abbiamo la capacità di spiegare il legame fra di essi e aggiungere informazioni utili e veritiere in un linguaggio semplice e comprensibile. Il problema affrontato oggigiorno non è ben documentato in letteratura e rappresenta una sfida interessante si per complessità che per mancanza di dataset per l'addestramento. Questo tipo di task, come altri di NLP, è affrontabile solo con modelli sempre più potenti ma che richiedono risorse sempre più elevate. Per questo motivo, è stato utilizzato il meccanismo di recente pubblicazione del Performer, dimostrando di riuscire a mantenere uno stesso grado di accuratezza e di qualità delle frasi prodotte, con una parallela riduzione delle risorse utilizzate. Ciò apre la strada all'utilizzo delle reti neurali più recenti anche senza avere i centri di calcolo delle multinazionali. Il modello proposto dunque è in grado di generare frasi che illustrano le relazioni semantiche di termini estratti da un mole di documenti testuali, permettendo di generare dei riassunti dell'informazione e della conoscenza estratta da essi e renderla facilmente accessibile e comprensibile al pazienti o a persone non esperte.
Resumo:
Twitter is a highly popular social media which on one hand allows information transmission in real time and on the other hand represents a source of open access homogeneous text data. We propose an analysis of the most common self-reported COVID symptoms from a dataset of Italian tweets to investigate the evolution of the pandemic in Italy from the end of September 2020 to the end of January 2021. After manually filtering tweets actually describing COVID symptoms from the database - which contains words related to fever, cough and sore throat - we discuss usefulness of such filtering. We then compare our time series with the daily data of new hospitalisations in Italy, with the aim of building a simple linear regression model that accounts for the delay which is observed from the tweets mentioning individual symptoms to new hospitalisations. We discuss both the results and limitations of linear regression given that our data suggests that the relationship between time series of symptoms tweets and of new hospitalisations changes towards the end of the acquisition.