Two bracketing schemes for the Penn Treebank
Contribuinte(s) |
University of Helsinki, Department of Modern Languages |
---|---|
Data(s) |
2006
|
Resumo |
The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing. |
Formato |
8 |
Identificador | |
Idioma(s) |
eng |
Publicador |
The Linguistic Association of Finland |
Relação |
A man of measure |
Fonte |
Yli-Jyrä , A 2006 , ' Two bracketing schemes for the Penn Treebank ' . in A man of measure . The Linguistic Association of Finland , Turku , pp. 472 - 479 . |
Palavras-Chave | #612 Languages and Literature #kieliteknologia #language technology #treebanks #syntax #phrase markers #kieliteknologia #113 Computer and information sciences #finite-state methods |
Tipo |
A3 Contribution to book/other compilations (refereed) info:eu-repo/semantics/bookPart info:eu-repo/semantics/publishedVersion |