2 resultados para software analysis
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
This research has explored the relationship between system test complexity and tacit knowledge. It is proposed as part of this thesis, that the process of system testing (comprising of test planning, test development, test execution, test fault analysis, test measurement, and case management), is directly affected by both complexity associated with the system under test, and also by other sources of complexity, independent of the system under test, but related to the wider process of system testing. While a certain amount of knowledge related to the system under test is inherent, tacit in nature, and therefore difficult to make explicit, it has been found that a significant amount of knowledge relating to these other sources of complexity, can indeed be made explicit. While the importance of explicit knowledge has been reinforced by this research, there has been a lack of evidence to suggest that the availability of tacit knowledge to a test team is of any less importance to the process of system testing, when operating in a traditional software development environment. The sentiment was commonly expressed by participants, that even though a considerable amount of explicit knowledge relating to the system is freely available, that a good deal of knowledge relating to the system under test, which is demanded for effective system testing, is actually tacit in nature (approximately 60% of participants operating in a traditional development environment, and 60% of participants operating in an agile development environment, expressed similar sentiments). To cater for the availability of tacit knowledge relating to the system under test, and indeed, both explicit and tacit knowledge required by system testing in general, an appropriate knowledge management structure needs to be in place. This would appear to be required, irrespective of the employed development methodology.