841 resultados para Natural language generation
Resumo:
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.
Resumo:
A system of computer assisted grammar construction (CAGC) is presented in this paper. The CAGC system is designed to generate broad-coverage grammars for large natural language corpora by utilizing both an extended inside-outside algorithm and an automatic phrase bracketing (AUTO) system which is designed to provide the extended algorithm with constituent information during learning. This paper demonstrates the capability of the CAGC system to deal with realistic natural language problems and the usefulness of the AUTO system for constraining the inside-outside based grammar re-estimation. Performance results, including coverage, recall and precision, are presented for a grammar constructed for the Wall Street Journal (WSJ) corpus using the Penn Treebank.
Resumo:
We extend previous work on fully unsupervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we address the problem of choosing the number of hidden states in unsupervised Markov models for PoS tagging. We experiment with two non-parametric priors, the Dirichlet and Pitman-Yor processes, on the Wall Street Journal dataset using a parallelized implementation of an iHMM inference algorithm. We evaluate the results with a variety of clustering evaluation metrics and achieve equivalent or better performances than previously reported. Building on this promising result we evaluate the output of the unsupervised PoS tagger as a direct replacement for the output of a fully supervised PoS tagger for the task of shallow parsing and compare the two evaluations. © 2009 ACL and AFNLP.
Resumo:
Construction of geotechnical structures produces various environmental impacts. These include depletion of limited natural resources, generation of wastes and harmful substances during material productions and construction, ineffective usage of energy during processing of raw materials into construction materials, and emissions of unwanted gasses during transportation of materials and usage of equipments. With increasing interests in sustainability at the global scale, there is a need to develop a methodology that can assess environmental impacts at such scale for geotechnical construction. Using embodied energy and gas emission, quantitative measures of environmental impact are evaluated using a case study of a new high speed railway line construction in the UK. Based on the results, the keys to energy savings are (a) to optimise the usage of materials with high embodied energy intensity value (b) to optimise the transportation network and logistics for processes using primarily low embodied energy intensity materials and (c) to reuse as much materials on-site as possible to minimise the quantity of spoils or distance to disposal sites. The evaluated embodied energy and embodied carbon values are compared to those of other types of structures and of other activities and carbon tax values. Such comparisons can be used to discuss among various interested parties (clients, contractors, consultants, policy makers, etc) to make the construction industry more energy efficient. © Springer Science+Business Media B.V. 2011.
Resumo:
We introduce a conceptually novel structured prediction model, GPstruct, which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M3N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.
Resumo:
We propose a new formally syntax-based method for statistical machine translation. Transductions between parsing trees are transformed into a problem of sequence tagging, which is then tackled by a search- based structured prediction method. This allows us to automatically acquire transla- tion knowledge from a parallel corpus without the need of complex linguistic parsing. This method can achieve compa- rable results with phrase-based method (like Pharaoh), however, only about ten percent number of translation table is used. Experiments show that the structured pre- diction approach for SMT is promising for its strong ability at combining words.
Resumo:
在低挡微机中速度较慢的串行处理硬设备条件下,利用本文提出的启发式概念,分层搜索和匹配策略以及设置最大搜索长度等方法,可使推理速度提高一个数量级以上.此外,通过引入语义信息,分阶段消除歧义,自顶向下与自底向上相结合,以及把一般疑问句一律变成相应陈述句的方法,解决了自动英语句法分析中的一系列难题,缩小了知识库的规模。
Resumo:
本文简要介绍了一个数控自动编程专家系统的自然语言接口的实现.该自然语言接口是以我们研制的数控自动编程专家系统为背景,运行在 SUN3/4 工作站的 UNIX 下和 IBM/AT 机的 DOS 下,用 C语言编程.该自然语言接口由词法分析、句法分析、语义语用分析、目标生成和图形仿真五个模块及相应的知识库构成.该接口能够接受数控编程系统所需的对工件的英语自然语言描述并处理一些比较简单的英语语言现象.
Resumo:
According to PDP theory, the author tries to use ANN method in sentence understanding. In input layer, distributed knowledge representation and integrate syntactic, semantic information (of the word in Chinese sentence) and context information are used to complete the case role assignment of six types of Chinese sentence by parallel processing. The model is a four-layer forward network, consisting of input layer, two hidden layers, and output layer(case role layer). In addition, the neural network method and the traditional symbol processing method used in natural language understanding is compared and analyzed, and a conclusion could be made: the neural network should be used as a powerful tool in this area.