23 resultados para Natural language generation
em Cambridge University Engineering Department Publications Database
Resumo:
Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.
Resumo:
Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural- language text. Our approach treats unknown regression functions non- parametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state- of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.
Resumo:
A system of computer assisted grammar construction (CAGC) is presented in this paper. The CAGC system is designed to generate broad-coverage grammars for large natural language corpora by utilizing both an extended inside-outside algorithm and an automatic phrase bracketing (AUTO) system which is designed to provide the extended algorithm with constituent information during learning. This paper demonstrates the capability of the CAGC system to deal with realistic natural language problems and the usefulness of the AUTO system for constraining the inside-outside based grammar re-estimation. Performance results, including coverage, recall and precision, are presented for a grammar constructed for the Wall Street Journal (WSJ) corpus using the Penn Treebank.
Resumo:
We extend previous work on fully unsupervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we address the problem of choosing the number of hidden states in unsupervised Markov models for PoS tagging. We experiment with two non-parametric priors, the Dirichlet and Pitman-Yor processes, on the Wall Street Journal dataset using a parallelized implementation of an iHMM inference algorithm. We evaluate the results with a variety of clustering evaluation metrics and achieve equivalent or better performances than previously reported. Building on this promising result we evaluate the output of the unsupervised PoS tagger as a direct replacement for the output of a fully supervised PoS tagger for the task of shallow parsing and compare the two evaluations. © 2009 ACL and AFNLP.
Resumo:
Construction of geotechnical structures produces various environmental impacts. These include depletion of limited natural resources, generation of wastes and harmful substances during material productions and construction, ineffective usage of energy during processing of raw materials into construction materials, and emissions of unwanted gasses during transportation of materials and usage of equipments. With increasing interests in sustainability at the global scale, there is a need to develop a methodology that can assess environmental impacts at such scale for geotechnical construction. Using embodied energy and gas emission, quantitative measures of environmental impact are evaluated using a case study of a new high speed railway line construction in the UK. Based on the results, the keys to energy savings are (a) to optimise the usage of materials with high embodied energy intensity value (b) to optimise the transportation network and logistics for processes using primarily low embodied energy intensity materials and (c) to reuse as much materials on-site as possible to minimise the quantity of spoils or distance to disposal sites. The evaluated embodied energy and embodied carbon values are compared to those of other types of structures and of other activities and carbon tax values. Such comparisons can be used to discuss among various interested parties (clients, contractors, consultants, policy makers, etc) to make the construction industry more energy efficient. © Springer Science+Business Media B.V. 2011.