265 resultados para lazy parsing
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. ^ Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. ^ This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model’s parsing mechanism. ^ The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. ^
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.
Resumo:
Clare, A. and King R.D. (2003) Data mining the yeast genome in a lazy functional language. In Practical Aspects of Declarative Languages (PADL'03) (won Best/Most Practical Paper award).
Resumo:
Attributions of laziness, reflected in teacher comments such as “just try harder and you will shine” may mask specific cognitive, learning, attentional or emotional problems that could explain low motivation in some children. This paper reports findings from an investigation of 20 children, aged 7 to 10 years, who were regarded as lazy by their parents and teachers. Questionnaire measures provided evidence of low levels of motivation and classroom engagement. Psychometric assessments revealed the presence of a range of difficulties including phonologically-based learning disabilities and significant problems with attention in 17 of the 20 children. The paper concludes that the special needs of an unknown number of children may be overlooked because they are simply presumed to be lazy.
Resumo:
The release of the Australian Curriculum English (ACE) by the Australian Curriculum, Assessment and Reporting Authority (ACARA) has revived debates about the role of grammar as English content knowledge. We consider some of the discussion circulating in the mainstream media vis-à-vis the intent of the ACE. We conclude that this curriculum draws upon the complementary tenets of traditional Latin-based grammar and systemic functional linguistics across the three strands of Language, Literature and Literacy in innovative ways. We argue that such an approach is necessary for working with contemporary multimodal and cross-cultural texts. To demonstrate the utility of this new approach, we draw out a set of learning outcomes from Year 6 and then map out a framework for relating the outcomes to the form and function of multimodal language. As a case in point, our analysis is of two online Coca-Cola advertising texts, one each from South Korea and Australia.
Resumo:
A travel article about a river cruise from Amsterdam to Basel. When Captain Plamen Veselinov invites me to join him on the bridge, I can at last put a question that’s been running through my mind for days. It’s about the locks. How does he manage to line up the vessel as it approaches? Is the ship guided in electronically? He returns my questions with a boyish smile that does a good deal to veil his many years on the river. Crunching his way through a heavy Bulgarian accent, he says, “No, it’s all in the eyes and the hands. It’s magic. Don’t tell David Copperfield. He would get very jealous...
Resumo:
The standard method for deciding bit-vector constraints is via eager reduction to propositional logic. This is usually done after first applying powerful rewrite techniques. While often efficient in practice, this method does not scale on problems for which top-level rewrites cannot reduce the problem size sufficiently. A lazy solver can target such problems by doing many satisfiability checks, each of which only reasons about a small subset of the problem. In addition, the lazy approach enables a wide range of optimization techniques that are not available to the eager approach. In this paper we describe the architecture and features of our lazy solver (LBV). We provide a comparative analysis of the eager and lazy approaches, and show how they are complementary in terms of the types of problems they can efficiently solve. For this reason, we propose a portfolio approach that runs a lazy and eager solver in parallel. Our empirical evaluation shows that the lazy solver can solve problems none of the eager solvers can and that the portfolio solver outperforms other solvers both in terms of total number of problems solved and the time taken to solve them.
Resumo:
A new method of specifying the syntax of programming languages, known as hierarchical language specifications (HLS), is proposed. Efficient parallel algorithms for parsing languages generated by HLS are presented. These algorithms run on an exclusive-read exclusive-write parallel random-access machine. They require O(n) processors and O(log2n) time, where n is the length of the string to be parsed. The most important feature of these algorithms is that they do not use a stack.
Resumo:
A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log2n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented.
Resumo:
We study lazy structure sharing as a tool for optimizing equivalence testing on complex data types, We investigate a number of strategies for implementing lazy structure sharing and provide upper and lower bounds on their performance (how quickly they effect ideal configurations of our data structure). In most cases when the strategies are applied to a restricted case of the problem, the bounds provide nontrivial improvements over the naive linear-time equivalence-testing strategy that employs no optimization. Only one strategy, however, which employs path compression, seems promising for the most general case of the problem.