808 resultados para Information extraction strategies
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence has extends to many areas of these fields and includes contributions to Machine Translation, word sense disambiguation, dialogue modeling and Information Extraction.This book celebrates the work of Yorick Wilks from the perspective of his peers. It consists of original chapters each of which analyses an aspect of his work and links it to current thinking in that area. His work has spanned over four decades but is shown to be pertinent to recent developments in language processing such as the Semantic Web.This volume forms a two-part set together with Words and Intelligence I, Selected Works by Yorick Wilks, by the same editors.
Resumo:
Information extraction or knowledge discovery from large data sets should be linked to data aggregation process. Data aggregation process can result in a new data representation with decreased number of objects of a given set. A deterministic approach to separable data aggregation means a lesser number of objects without mixing of objects from different categories. A statistical approach is less restrictive and allows for almost separable data aggregation with a low level of mixing of objects from different categories. Layers of formal neurons can be designed for the purpose of data aggregation both in the case of deterministic and statistical approach. The proposed designing method is based on minimization of the of the convex and piecewise linear (CPL) criterion functions.
Resumo:
In this paper, we propose an unsupervised methodology to automatically discover pairs of semantically related words by highlighting their local environment and evaluating their semantic similarity in local and global semantic spaces. This proposal di®ers from previous research as it tries to take the best of two different methodologies i.e. semantic space models and information extraction models. It can be applied to extract close semantic relations, it limits the search space and it is unsupervised.
Resumo:
As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.
Resumo:
This study analyzes the qualitative and quantitative patterns of notetaking by learning disabled (LD) and nondisabled (ND) adolescents and the effectiveness of notetaking and review as measured by the subjects' ability to recall information presented during a lecture. The study also examines relationships between certain learner characteristics and notetaking. The following notetaking variables were investigated: note completeness, number of critical ideas recorded, levels of processing information, organizational strategies, fluency of notes, and legibility of notes. The learner characteristics examined pertained to measures on achievement, short-term memory, listening comprehension, and verbal ability.^ Students from the 11th and 12th grades were randomly selected from four senior high schools in Dade County, Florida. Seventy learning disabled and 79 nondisabled subjects were shown a video tape lecture and required to take notes. The lecture conditions controlled for presentation rate, prior knowledge, information density, and difficulty level. After 8 weeks, their notes were returned to the subjects for a review period, and a posttest was administered.^ Results of this study suggest significant differences (p $\le$.01) in the patterns of notetaking between LD and ND groups not due to differences in the learner characteristics listed above. In addition, certain notetaking variables such as process levels, number of critical ideas, and note completeness were found to be significantly correlated to learning outcome. Further, deficiencies in the spontaneous use of organizational strategies and abbreviations adversely affected the notetaking effectiveness of learning disabled students.^ Both LD and ND subjects recalled more information recorded in their notes than not recorded. This difference was significant only for the ND group. By contrast, LD subjects compensated for their poor notetaking skills and recalled significantly more information not recorded on their notes than did ND subjects. The major implications of these findings suggest that LD and ND subjects exhibit very different entry behaviors when asked to perform a notetaking task; hence, teaching approaches to notetaking must differ as well. ^
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. ^ In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.^
Resumo:
Information extraction is a frequent and relevant problem in digital signal processing. In the past few years, different methods have been utilized for the parameterization of signals and the achievement of efficient descriptors. When the signals possess statistical cyclostationary properties, the Cyclic Autocorrelation Function (CAF) and the Spectral Cyclic Density (SCD) can be used to extract second-order cyclostationary information. However, second-order cyclostationary information is poor in nongaussian signals, as the cyclostationary analysis in this case should comprise higher-order statistical information. This paper proposes a new mathematical tool for the higher-order cyclostationary analysis based on the correntropy function. Specifically, the cyclostationary analysis is revisited focusing on the information theory, while the Cyclic Correntropy Function (CCF) and Cyclic Correntropy Spectral Density (CCSD) are also defined. Besides, it is analytically proven that the CCF contains information regarding second- and higher-order cyclostationary moments, being a generalization of the CAF. The performance of the aforementioned new functions in the extraction of higher-order cyclostationary characteristics is analyzed in a wireless communication system where nongaussian noise exists.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.
Resumo:
Abstract Heading into the 2020s, Physics and Astronomy are undergoing experimental revolutions that will reshape our picture of the fabric of the Universe. The Large Hadron Collider (LHC), the largest particle physics project in the world, produces 30 petabytes of data annually that need to be sifted through, analysed, and modelled. In astrophysics, the Large Synoptic Survey Telescope (LSST) will be taking a high-resolution image of the full sky every 3 days, leading to data rates of 30 terabytes per night over ten years. These experiments endeavour to answer the question why 96% of the content of the universe currently elude our physical understanding. Both the LHC and LSST share the 5-dimensional nature of their data, with position, energy and time being the fundamental axes. This talk will present an overview of the experiments and data that is gathered, and outlines the challenges in extracting information. Common strategies employed are very similar to industrial data! Science problems (e.g., data filtering, machine learning, statistical interpretation) and provide a seed for exchange of knowledge between academia and industry. Speaker Biography Professor Mark Sullivan Mark Sullivan is a Professor of Astrophysics in the Department of Physics and Astronomy. Mark completed his PhD at Cambridge, and following postdoctoral study in Durham, Toronto and Oxford, now leads a research group at Southampton studying dark energy using exploding stars called "type Ia supernovae". Mark has many years' experience of research that involves repeatedly imaging the night sky to track the arrival of transient objects, involving significant challenges in data handling, processing, classification and analysis.
Resumo:
The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).
Resumo:
Interações sociais são frequentemente descritas como trocas sociais. Na literatura, trocas sociais em Sistemas Multiagentes são objeto de estudo em diversos contextos, nos quais as relações sociais são interpretadas como trocas sociais. Dentre os problemas estudados, um problema fundamental discutido na literatura e a regulação¸ ao de trocas sociais, por exemplo, a emergência de trocas equilibradas ao longo do tempo levando ao equilíbrio social e/ou comportamento de equilíbrio/justiça. Em particular, o problema da regulação de trocas sociais e difícil quando os agentes tem informação incompleta sobre as estratégias de troca dos outros agentes, especificamente se os agentes tem diferentes estratégias de troca. Esta dissertação de mestrado propõe uma abordagem para a autorregulacao de trocas sociais em sistemas multiagentes, baseada na Teoria dos Jogos. Propõe o modelo de Jogo de Autorregulacão ao de Processos de Trocas Sociais (JAPTS), em uma versão evolutiva e espacial, onde os agentes organizados em uma rede complexa, podem evoluir suas diferentes estratégias de troca social. As estratégias de troca são definidas através dos parâmetros de uma função de fitness. Analisa-se a possibilidade do surgimento do comportamento de equilíbrio quando os agentes, tentando maximizar sua adaptação através da função de fitness, procuram aumentar o numero de interações bem sucedidas. Considera-se um jogo de informação incompleta, uma vez que os agentes não tem informações sobre as estratégias de outros agentes. Para o processo de aprendizado de estratégias, utiliza-se um algoritmo evolutivo, no qual os agentes visando maximizar a sua função de fitness, atuam como autorregulares dos processos de trocas possibilitadas pelo jogo, contribuindo para o aumento do numero de interações bem sucedidas. São analisados 5 diferentes casos de composição da sociedade. Para alguns casos, analisa-se também um segundo tipo de cenário, onde a topologia de rede é modificada, representando algum tipo de mobilidade, a fim de analisar se os resultados são dependentes da vizinhança. Alem disso, um terceiro cenário é estudado, no qual é se determinada uma política de influencia, quando as medias dos parâmetros que definem as estratégias adotadas pelos agentes tornam-se publicas em alguns momentos da simulação, e os agentes que adotam a mesma estratégia de troca, influenciados por isso, imitam esses valores. O modelo foi implementado em NetLogo.
Resumo:
Humans have a high ability to extract visual data information acquired by sight. Trought a learning process, which starts at birth and continues throughout life, image interpretation becomes almost instinctively. At a glance, one can easily describe a scene with reasonable precision, naming its main components. Usually, this is done by extracting low-level features such as edges, shapes and textures, and associanting them to high level meanings. In this way, a semantic description of the scene is done. An example of this, is the human capacity to recognize and describe other people physical and behavioral characteristics, or biometrics. Soft-biometrics also represents inherent characteristics of human body and behaviour, but do not allow unique person identification. Computer vision area aims to develop methods capable of performing visual interpretation with performance similar to humans. This thesis aims to propose computer vison methods which allows high level information extraction from images in the form of soft biometrics. This problem is approached in two ways, unsupervised and supervised learning methods. The first seeks to group images via an automatic feature extraction learning , using both convolution techniques, evolutionary computing and clustering. In this approach employed images contains faces and people. Second approach employs convolutional neural networks, which have the ability to operate on raw images, learning both feature extraction and classification processes. Here, images are classified according to gender and clothes, divided into upper and lower parts of human body. First approach, when tested with different image datasets obtained an accuracy of approximately 80% for faces and non-faces and 70% for people and non-person. The second tested using images and videos, obtained an accuracy of about 70% for gender, 80% to the upper clothes and 90% to lower clothes. The results of these case studies, show that proposed methods are promising, allowing the realization of automatic high level information image annotation. This opens possibilities for development of applications in diverse areas such as content-based image and video search and automatica video survaillance, reducing human effort in the task of manual annotation and monitoring.
Resumo:
[EU]Hizkuntzaren prozesamenduan testu koherenteetan kausa taldeko erlazioak (KAUSA, ONDORIOA eta HELBURUA) automatikoki hautematea eta bereiztea erabilgarria da galdera-erantzun automatikoko sistemak eraikitzerako orduan. Horretarako Egitura Erretorikoaren Teoria (Rhetorical Structure Theory, aurrerantzean RST) eta bere erlazioak erabiliko ditugu, corpus bezala RST Treebank -a (Iruskieta et al., 2013) hartuta, zientziako laburpen-testuz osatutako corpusa, hain zuzen ere. Corpus hori XML formatuan deskargatu eta hortik XPATH tresnaren bidez informazio garrantzitsuena eskuratzen dugu. Lan honek 3 helburu nagusi ditu: lehendabizi, kausa taldeko erlazioak elkarren artean bereiztea, bigarrenez, kausa taldeko erlazio hauek beste erlazio guztiekin bereiztea, eta azkenik, EBALUAZIOA eta INTERPRETAZIOA erlazioak bereiztea sentimendu analisian aplikatu ahal izateko. Ataza horiek egiteko, RhetDB tresnarekin eskuratu diren patroi ensaguratsuenak erabili eta bi aplikazio garatu ditugu. Alde batetik, bilatu nahi ditugun patroiak adierazi eta erlazio-egitura duen edonolako testuetan bilaketak egiten dituen bilatzailea, eta bestetik, patroi esanguratsuenak emanda erlazioak etiketatzen dituen etiketatzailea. Bi aplikazio hauek gainera, ahalik eta modu parametrizagarrienean erabiltzeko garatu ditugu, kodea aldatu gabe edonork erabili ahal izateko antzeko atazak egiteko. Etiketatzaileak ebaluatu ondoren, identifikatzeko erlaziorik errazena HELBURUA erlazioa dela ikusi dugu eta KAUSA eta ONDORIOA bereizteko arazo gehiago dauzkagula ere ondorioztatu dugu. Modu berean, EBALUAZIOA eta INTERPRETAZIOA ere elkarren artean bereiz dezakegula ikusi dugu.
Resumo:
Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014
Resumo:
Background : Developmental coordination disorder (DCD) is a prevalent neurodevelopmental disorder. Best practices include raising parents’ awareness and building capacity but few interventions incorporating these best practices are documented. Objective : To examine whether an evidence-based online module can increase the perceived knowledge and skills of parents of children with DCD, and lead to behavioural changes when managing their child’s health condition. Methods : A mixed-methods, before-after-follow-up design guided by the theory of planned behaviour was employed. Data about the knowledge, skills and behaviours of parents of children with DCD were collected using questionnaires prior to completing the module, immediately after, and three months later. One-way repeated measures ANOVAs and thematic analyses were performed on data as appropriate. Results : Fifty-eight participants completed all questionnaires. There was a significant effect of time on self-reported knowledge [F(2.00,114.00)=16.37, p=0.00] and skills [F(1.81,103.03)=51.37, p=0.00] with higher post- and follow-up scores than pre-intervention scores. Thirty-seven (65%) participants reported an intention to change behaviour postintervention; 29 (50%) participants had tried recommended strategies at follow-up. Three themes emerged to describe parents’ behavioural change: sharing information, trialing strategies and changing attitudes. Factors influencing parents’ ability to implement these behavioural changes included clear recommendations, time, and ‘right’ attitude. Perceived outcomes associated with the parental behavioural changes involved improvement in well-being for the children at school, at home, and for the family as a whole. Conclusions : The online module increased parents’ self-reported knowledge and skills in DCD management. Future research should explore its impacts on children’s outcomes long-term.