969 resultados para Natural language interface
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
Real-time geoparsing of social media streams (e.g. Twitter, YouTube, Instagram, Flickr, FourSquare) is providing a new 'virtual sensor' capability to end users such as emergency response agencies (e.g. Tsunami early warning centres, Civil protection authorities) and news agencies (e.g. Deutsche Welle, BBC News). Challenges in this area include scaling up natural language processing (NLP) and information retrieval (IR) approaches to handle real-time traffic volumes, reducing false positives, creating real-time infographic displays useful for effective decision support and providing support for trust and credibility analysis using geosemantics. I will present in this seminar on-going work by the IT Innovation Centre over the last 4 years (TRIDEC and REVEAL FP7 projects) in building such systems, and highlights our research towards improving trustworthy and credible of crisis map displays and real-time analytics for trending topics and influential social networks during major news worthy events.
Resumo:
This talk will present an overview of the ongoing ERCIM project SMARTDOCS (SeMAntically-cReaTed DOCuments) which aims at automatically generating webpages from RDF data. It will particularly focus on the current issues and the investigated solutions in the different modules of the project, which are related to document planning, natural language generation and multimedia perspectives. The second part of the talk will be dedicated to the KODA annotation system, which is a knowledge-base-agnostic annotator designed to provide the RDF annotations required in the document generation process.
Resumo:
Title: Data-Driven Text Generation using Neural Networks Speaker: Pavlos Vougiouklis, University of Southampton Abstract: Recent work on neural networks shows their great potential at tackling a wide variety of Natural Language Processing (NLP) tasks. This talk will focus on the Natural Language Generation (NLG) problem and, more specifically, on the extend to which neural network language models could be employed for context-sensitive and data-driven text generation. In addition, a neural network architecture for response generation in social media along with the training methods that enable it to capture contextual information and effectively participate in public conversations will be discussed. Speaker Bio: Pavlos Vougiouklis obtained his 5-year Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki in 2013. He was awarded an MSc degree in Software Engineering from the University of Southampton in 2014. In 2015, he joined the Web and Internet Science (WAIS) research group of the University of Southampton and he is currently working towards the acquisition of his PhD degree in the field of Neural Network Approaches for Natural Language Processing. Title: Provenance is Complicated and Boring — Is there a solution? Speaker: Darren Richardson, University of Southampton Abstract: Paper trails, auditing, and accountability — arguably not the sexiest terms in computer science. But then you discover that you've possibly been eating horse-meat, and the importance of provenance becomes almost palpable. Having accepted that we should be creating provenance-enabled systems, the challenge of then communicating that provenance to casual users is not trivial: users should not have to have a detailed working knowledge of your system, and they certainly shouldn't be expected to understand the data model. So how, then, do you give users an insight into the provenance, without having to build a bespoke system for each and every different provenance installation? Speaker Bio: Darren is a final year Computer Science PhD student. He completed his undergraduate degree in Electronic Engineering at Southampton in 2012.
Resumo:
En este art??culo se presenta un sistema de di??logo hablado en desarrollo llamado TRIVIAL, cuya finalidad es favorecer el aprendizaje de contenidos docentes por parte de los alumnos de la Universidad de Granada. La interacci??n entre los usuarios y el sistema se lleva a cabo mediante di??logos en lenguaje natural realizados de forma oral, por lo que el sistema se diferencia claramente de otras herramientas de apoyo a la docencia basadas en Tecnolog??as de la Informaci??n y las Comunicaciones (TICs). Creemos que el sistema puede fomentar el desarrollo del espacio innovador promovido por el Espacio Europeo de Educaci??n Superior (EEES), pues permite dar una visi??n diferente de las asignaturas, en la que el alumno es el actor principal del proceso de aprendizaje.
Resumo:
Los asistentes virtuales son herramientas inteligentes que ayudan a los usuarios a buscar información en un conglomerado de recursos web. El despliegue natural de los mismos se realiza en las propias páginas web, donde permiten resolver las dudas de los usuarios formuladas en lenguaje natural usando técnicas de Inteligencia Artificial. En este artículo presentamos las características más relevantes del asistente virtual Elvira y su integración en la página web de la Universidad de Granada. De forma paralela a la aparición de los asistentes virtuales, en la última década, los avances tecnológicos han hecho que el acceso a la información se produzca desde diferentes fuentes, trasladando la necesidad de la asistencia artificial a otros ámbitos. En este trabajo, detallamos la ampliación de los despliegues del asistente virtual Elvira sobre dispositivos móviles y redes sociales.
Resumo:
Las bases de datos geoespaciales temáticas en distintas escalas geográficas y temporales, son necesarias en multitud de líneas de investigación. Una de ellas es la gestión y alerta temprana de riesgos de desastres por amenazas naturales (inundaciones, huracanes, terremotos, etc.). Las noticias sobre éste tema se publican habitualmente en periódicos digitales de todo el mundo y comportan un alto contenido geográfico. Este trabajo pretende extraer automáticamente las noticias emitidas por canales de re-difusión web (conocidos por las siglas RSS en inglés) para georreferenciarlas, almacenarlas y distribuirlas como datos geoespaciales. Mediante técnicas de procesamiento de lenguaje natural y consultas a bases de datos de topónimos realizaremos la extracción de la información. El caso de estudio se aplicará para México y todos los componentes utilizados serán de código abierto
Resumo:
A presente pesquisa tem como Questão Central a avaliação da estrutura da narrativa por um grupo de seis crianças surdas profundas utilizando a Língua Gestual Portuguesa (LGP). Neste estudo questionou-se se se encontrariam diferenças na produção da narrativa entre crianças que adquiriram a LGP precocemente e as que tiveram o primeiro contato com a língua materna tardiamente. Colocou-se a hipótese geral que a língua natural das crianças surdas portuguesas é a LGP e as que tiveram um acesso precoce à sua língua apresentam um melhor desempenho na narração de uma história. Para verificar esta hipótese, foi aplicada uma prova que consistia no conto de uma história, a partir de uma sequência de imagens, em LGP a todos as crianças desta investigação. Releva-se a importância de um precoce ambiente comunicativo para que a criança surda adquira um desenvolvimento global semelhante aos seus pares ouvintes. Salienta-se ainda que a influência dos pais, dos educadores e dos professores é fundamental para que a criança surda possa desenvolver a sua língua natural, a LGP e a aprendizagem da segunda língua. Os resultados obtidos confirmaram as hipóteses colocadas, ou seja, as crianças que adquiriram precocemente a LGP apresentaram um maior desenvolvimento na estrutura da narrativa.
Resumo:
A língua gestual (LG) é a língua natural da pessoa surda, sendo utilizada como forma de expressão e comunicação da comunidade surda de um determinado país. Porém, é de todo impossível escrever estas línguas através de um alfabeto comum como o da Língua Portuguesa (LP). Em 1974, na Dinamarca, Valerie Sutton criou o SignWriting (SW), um sistema de escrita das línguas gestuais, contrariando assim a ideia de que as línguas espaço-visuais não poderiam ter uma representação gráfica. Para o surgimento deste sistema foram fundamentais os estudos pioneiros de William Stokoe que reconheceram o estatuto linguístico das línguas gestuais, atribuindo-lhes propriedades inerentes a uma língua, como por exemplo a arbitrariedade e convencionalidade. Neste trabalho apresentamos o SW, sistema de escrita das línguas gestuais já utilizado noutros países, e questionamos se é exequível e profícua a sua adaptação à língua gestual portuguesa (LGP). Nesse sentido, concretizamos a escrita da LGP com base em áreas vocabulares distintas e presentes no programa curricular do ensino da LGP. Por último, efetivamos tal proposta através de um modelo de ação de formação em SW.
Resumo:
Chatterbox Challenge is an annual web-based contest for artificial conversational systems, ACE. The 2010 instantiation was the tenth consecutive contest held between March and June in the 60th year following the publication of Alan Turing’s influential disquisition ‘computing machinery and intelligence’. Loosely based on Turing’s viva voca interrogator-hidden witness imitation game, a thought experiment to ascertain a machine’s capacity to respond satisfactorily to unrestricted questions, the contest provides a platform for technology comparison and evaluation. This paper provides an insight into emotion content in the entries since the 2005 Chatterbox Challenge. The authors find that synthetic textual systems, none of which are backed by academic or industry funding, are, on the whole and more than half a century since Weizenbaum’s natural language understanding experiment, little further than Eliza in terms of expressing emotion in dialogue. This may be a failure on the part of the academic AI community for ignoring the Turing test as an engineering challenge.
Resumo:
Purpose – The purpose of this paper is to consider Turing's two tests for machine intelligence: the parallel-paired, three-participants game presented in his 1950 paper, and the “jury-service” one-to-one measure described two years later in a radio broadcast. Both versions were instantiated in practical Turing tests during the 18th Loebner Prize for artificial intelligence hosted at the University of Reading, UK, in October 2008. This involved jury-service tests in the preliminary phase and parallel-paired in the final phase. Design/methodology/approach – Almost 100 test results from the final have been evaluated and this paper reports some intriguing nuances which arose as a result of the unique contest. Findings – In the 2008 competition, Turing's 30 per cent pass rate is not achieved by any machine in the parallel-paired tests but Turing's modified prediction: “at least in a hundred years time” is remembered. Originality/value – The paper presents actual responses from “modern Elizas” to human interrogators during contest dialogues that show considerable improvement in artificial conversational entities (ACE). Unlike their ancestor – Weizenbaum's natural language understanding system – ACE are now able to recall, share information and disclose personal interests.
Resumo:
This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.
Resumo:
This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.
Resumo:
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
Resumo:
Using a new proposal for the ""picture lowering"" operators, we compute the tree level scattering amplitude in the minimal pure spinor formalism by performing the integration over the pure spinor space as a multidimensional Cauchy-type integral. The amplitude will be written in terms of the projective pure spinor variables, which turns out to be useful to relate rigorously the minimal and non-minimal versions of the pure spinor formalism. The natural language for relating these formalisms is the. Cech-Dolbeault isomorphism. Moreover, the Dolbeault cocycle corresponding to the tree-level scattering amplitude must be evaluated in SO(10)/SU(5) instead of the whole pure spinor space, which means that the origin is removed from this space. Also, the. Cech-Dolbeault language plays a key role for proving the invariance of the scattering amplitude under BRST, Lorentz and supersymmetry transformations, as well as the decoupling of unphysical states. We also relate the Green`s function for the massless scalar field in ten dimensions to the tree-level scattering amplitude and comment about the scattering amplitude at higher orders. In contrast with the traditional picture lowering operators, with our new proposal the tree level scattering amplitude is independent of the constant spinors introduced to define them and the BRST exact terms decouple without integrating over these constant spinors.