6 resultados para Dependency parsing
em Boston University Digital Common
Resumo:
We give a hybrid algorithm for parsing epsilon grammars based on Tomita's non-ϵ-grammar parsing algorithm ([Tom86]) and Nozohoor-Farshi's ϵ-grammar recognition algorithm ([NF91]). The hybrid parser handles the same set of grammars handled by Nozohoor-Farshi's recognizer. The algorithm's details and an example of its use are given. We also discuss the deployment of the hybrid algorithm within a GB parser, and the reason an ϵ grammar parser is needed in our GB parser.
Resumo:
By utilizing structure sharing among its parse trees, a GB parser can increase its efficiency dramatically. Using a GB parser which has as its phrase structure recovery component an implementation of Tomita's algorithm (as described in [Tom86]), we investigate how a GB parser can preserve the structure sharing output by Tomita's algorithm. In this report, we discuss the implications of using Tomita's algorithm in GB parsing, and we give some details of the structuresharing parser currently under construction. We also discuss a method of parallelizing a GB parser, and relate it to the existing literature on parallel GB parsing. Our approach to preserving sharing within a shared-packed forest is applicable not only to GB parsing, but anytime we want to preserve structure sharing in a parse forest in the presence of features.
Resumo:
We describe a GB parser implemented along the lines of those written by Fong [4] and Dorr [2]. The phrase structure recovery component is an implementation of Tomita's generalized LR parsing algorithm (described in [10]), with recursive control flow (similar to Fong's implementation). The major principles implemented are government, binding, bounding, trace theory, case theory, θ-theory, and barriers. The particular version of GB theory we use is that described by Haegeman [5]. The parser is minimal in the sense that it implements the major principles needed in a GB parser, and has fairly good coverage of linguistically interesting portions of the English language.
Resumo:
Various restrictions on the terms allowed for substitution give rise to different cases of semi-unification. Semi-unification on finite and regular terms has already been considered in the literature. We introduce a general case of semi-unification where substitutions are allowed on non-regular terms, and we prove the equivalence of this general case to a well-known undecidable data base dependency problem, thus establishing the undecidability of general semi-unification. We present a unified way of looking at the various problems of semi-unification. We give some properties that are common to all the cases of semi-unification. We also the principality property and the solution set for those problems. We prove that semi-unification on general terms has the principality property. Finally, we present a recursive inseparability result between semi-unification on regular terms and semi-unification on general terms.
Resumo:
Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of scales self-similarity. In this paper, we examine a mechanism that gives rise to self-similar network traffic and present some of its performance implications. The mechanism we study is the transfer of files or messages whose size is drawn from a heavy-tailed distribution. We examine its effects through detailed transport-level simulations of multiple TCP streams in an internetwork. First, we show that in a "realistic" client/server network environment i.e., one with bounded resources and coupling among traffic sources competing for resources the degree to which file sizes are heavy-tailed can directly determine the degree of traffic self-similarity at the link level. We show that this causal relationship is not significantly affected by changes in network resources (bottleneck bandwidth and buffer capacity), network topology, the influence of cross-traffic, or the distribution of interarrival times. Second, we show that properties of the transport layer play an important role in preserving and modulating this relationship. In particular, the reliable transmission and flow control mechanisms of TCP (Reno, Tahoe, or Vegas) serve to maintain the long-range dependency structure induced by heavy-tailed file size distributions. In contrast, if a non-flow-controlled and unreliable (UDP-based) transport protocol is used, the resulting traffic shows little self-similar characteristics: although still bursty at short time scales, it has little long-range dependence. If flow-controlled, unreliable transport is employed, the degree of traffic self-similarity is positively correlated with the degree of throttling at the source. Third, in exploring the relationship between file sizes, transport protocols, and self-similarity, we are also able to show some of the performance implications of self-similarity. We present data on the relationship between traffic self-similarity and network performance as captured by performance measures including packet loss rate, retransmission rate, and queueing delay. Increased self-similarity, as expected, results in degradation of performance. Queueing delay, in particular, exhibits a drastic increase with increasing self-similarity. Throughput-related measures such as packet loss and retransmission rate, however, increase only gradually with increasing traffic self-similarity as long as reliable, flow-controlled transport protocol is used.
Resumo:
In this paper we examine a number of admission control and scheduling protocols for high-performance web servers based on a 2-phase policy for serving HTTP requests. The first "registration" phase involves establishing the TCP connection for the HTTP request and parsing/interpreting its arguments, whereas the second "service" phase involves the service/transmission of data in response to the HTTP request. By introducing a delay between these two phases, we show that the performance of a web server could be potentially improved through the adoption of a number of scheduling policies that optimize the utilization of various system components (e.g. memory cache and I/O). In addition, to its premise for improving the performance of a single web server, the delineation between the registration and service phases of an HTTP request may be useful for load balancing purposes on clusters of web servers. We are investigating the use of such a mechanism as part of the Commonwealth testbed being developed at Boston University.