2 resultados para document sorting

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A permutation is said to avoid a pattern if it does not contain any subsequence which is order-isomorphic to it. Donald Knuth, in the first volume of his celebrated book "The art of Computer Programming", observed that the permutations that can be computed (or, equivalently, sorted) by some particular data structures can be characterized in terms of pattern avoidance. In more recent years, the topic was reopened several times, while often in terms of sortable permutations rather than computable ones. The idea to sort permutations by using one of Knuth’s devices suggests to look for a deterministic procedure that decides, in linear time, if there exists a sequence of operations which is able to convert a given permutation into the identical one. In this thesis we show that, for the stack and the restricted deques, there exists an unique way to implement such a procedure. Moreover, we use these sorting procedures to create new sorting algorithms, and we prove some unexpected commutation properties between these procedures and the base step of bubblesort. We also show that the permutations that can be sorted by a combination of the base steps of bubblesort and its dual can be expressed, once again, in terms of pattern avoidance. In the final chapter we give an alternative proof of some enumerative results, in particular for the classes of permutations that can be sorted by the two restricted deques. It is well-known that the permutations that can be sorted through a restricted deque are counted by the Schrӧder numbers. In the thesis, we show how the deterministic sorting procedures yield a bijection between sortable permutations and Schrӧder paths.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis aims at investigating a new approach to document analysis based on the idea of structural patterns in XML vocabularies. My work is founded on the belief that authors do naturally converge to a reasonable use of markup languages and that extreme, yet valid instances are rare and limited. Actual documents, therefore, may be used to derive classes of elements (patterns) persisting across documents and distilling the conceptualization of the documents and their components, and may give ground for automatic tools and services that rely on no background information (such as schemas) at all. The central part of my work consists in introducing from the ground up a formal theory of eight structural patterns (with three sub-patterns) that are able to express the logical organization of any XML document, and verifying their identifiability in a number of different vocabularies. This model is characterized by and validated against three main dimensions: terseness (i.e. the ability to represent the structure of a document with a small number of objects and composition rules), coverage (i.e. the ability to capture any possible situation in any document) and expressiveness (i.e. the ability to make explicit the semantics of structures, relations and dependencies). An algorithm for the automatic recognition of structural patterns is then presented, together with an evaluation of the results of a test performed on a set of more than 1100 documents from eight very different vocabularies. This language-independent analysis confirms the ability of patterns to capture and summarize the guidelines used by the authors in their everyday practice. Finally, I present some systems that work directly on the pattern-based representation of documents. The ability of these tools to cover very different situations and contexts confirms the effectiveness of the model.