Two bracketing schemes for the Penn Treebank


Autoria(s): Yli-Jyrä, Anssi Mikael
Contribuinte(s)

University of Helsinki, Department of Modern Languages

Data(s)

2006

Resumo

The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing.

Formato

8

Identificador

http://hdl.handle.net/10138/24957

Idioma(s)

eng

Publicador

The Linguistic Association of Finland

Relação

A man of measure Festschrift in honour of Fred Karlsson on his 60th birthday

SKY journal of linguistics, special supplement

Fonte

Yli-Jyrä , A M 2006 , ' Two bracketing schemes for the Penn Treebank ' . in A man of measure : Festschrift in honour of Fred Karlsson on his 60th birthday . SKY journal of linguistics, special supplement , no. 19 , The Linguistic Association of Finland , Turku , pp. 472-479 .

Palavras-Chave #612 Languages and Literature #välimerkit #lauseoppi #välimerkit #lauseoppi #phrase boundaries #punctuation #treebanks #puupankit #syntax #jäsentäminen #parsing #keskeisupotus #center-embedding #välimerkit #lauseoppi #113 Computer and information sciences #finite-state methods #äärellistilaiset menetelmät
Tipo

A3 Contribution to book/other compilations (refereed)

info:eu-repo/semantics/bookPart

info:eu-repo/semantics/publishedVersion