2 resultados para Stochastic Context-Free Grammars
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Organizations that leverage lessons learned from their experience in the practice of complex real-world activities are faced with five difficult problems. First, how to represent the learning situation in a recognizable way. Second, how to represent what was actually done in terms of repeatable actions. Third, how to assess performance taking account of the particular circumstances. Fourth, how to abstract lessons learned that are re-usable on future occasions. Fifth, how to determine whether to pursue practice maturity or strategic relevance of activities. Here, organizational learning and performance improvement are investigated in a field study using the Context-based Intelligent Assistant Support (CIAS) approach. A new conceptual framework for practice-based organizational learning and performance improvement is presented that supports researchers and practitioners address the problems evoked and contributes to a practice-based approach to activity management. The novelty of the research lies in the simultaneous study of the different levels involved in the activity. Route selection in light rail infrastructure projects involves practices at both the strategic and operational levels; it is part managerial/political and part engineering. Aspectual comparison of practices represented in Contextual Graphs constitutes a new approach to the selection of Key Performance Indicators (KPIs). This approach is free from causality assumptions and forms the basis of a new approach to practice-based organizational learning and performance improvement. The evolution of practices in contextual graphs is shown to be an objective and measurable expression of organizational learning. This diachronic representation is interpreted using a practice-based organizational learning novelty typology. This dissertation shows how lessons learned when effectively leveraged by an organization lead to practice maturity. The practice maturity level of an activity in combination with an assessment of an activity’s strategic relevance can be used by management to prioritize improvement effort.