4 resultados para CATEGORICAL-DATA ANALYSIS

em CORA - Cork Open Research Archive - University College Cork - Ireland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Systematic, high-quality observations of the atmosphere, oceans and terrestrial environments are required to improve understanding of climate characteristics and the consequences of climate change. The overall aim of this report is to carry out a comparative assessment of approaches taken to addressing the state of European observations systems and related data analysis by some leading actors in the field. This research reports on approaches to climate observations and analyses in Ireland, Switzerland, Germany, The Netherlands and Austria and explores options for a more coordinated approach to national responses to climate observations in Europe. The key aspects addressed are: an assessment of approaches to develop GCOS and provision of analysis of GCOS data; an evaluation of how these countries are reporting development of GCOS; highlighting best practice in advancing GCOS implementation including analysis of Essential Climate Variables (ECVs); a comparative summary of the differences and synergies in terms of the reporting of climate observations; an overview of relevant European initiatives and recommendations on how identified gaps might be addressed in the short to medium term.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Energy efficiency and user comfort have recently become priorities in the Facility Management (FM) sector. This has resulted in the use of innovative building components, such as thermal solar panels, heat pumps, etc., as they have potential to provide better performance, energy savings and increased user comfort. However, as the complexity of components increases, the requirement for maintenance management also increases. The standard routine for building maintenance is inspection which results in repairs or replacement when a fault is found. This routine leads to unnecessary inspections which have a cost with respect to downtime of a component and work hours. This research proposes an alternative routine: performing building maintenance at the point in time when the component is degrading and requires maintenance, thus reducing the frequency of unnecessary inspections. This thesis demonstrates that statistical techniques can be used as part of a maintenance management methodology to invoke maintenance before failure occurs. The proposed FM process is presented through a scenario utilising current Building Information Modelling (BIM) technology and innovative contractual and organisational models. This FM scenario supports a Degradation based Maintenance (DbM) scheduling methodology, implemented using two statistical techniques, Particle Filters (PFs) and Gaussian Processes (GPs). DbM consists of extracting and tracking a degradation metric for a component. Limits for the degradation metric are identified based on one of a number of proposed processes. These processes determine the limits based on the maturity of the historical information available. DbM is implemented for three case study components: a heat exchanger; a heat pump; and a set of bearings. The identified degradation points for each case study, from a PF, a GP and a hybrid (PF and GP combined) DbM implementation are assessed against known degradation points. The GP implementations are successful for all components. For the PF implementations, the results presented in this thesis find that the extracted metrics and limits identify degradation occurrences accurately for components which are in continuous operation. For components which have seasonal operational periods, the PF may wrongly identify degradation. The GP performs more robustly than the PF, but the PF, on average, results in fewer false positives. The hybrid implementations, which are a combination of GP and PF results, are successful for 2 of 3 case studies and are not affected by seasonal data. Overall, DbM is effectively applied for the three case study components. The accuracy of the implementations is dependant on the relationships modelled by the PF and GP, and on the type and quantity of data available. This novel maintenance process can improve equipment performance and reduce energy wastage from BSCs operation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An overview is given of a user interaction monitoring and analysis framework called BaranC. Monitoring and analysing human-digital interaction is an essential part of developing a user model as the basis for investigating user experience. The primary human-digital interaction, such as on a laptop or smartphone, is best understood and modelled in the wider context of the user and their environment. The BaranC framework provides monitoring and analysis capabilities that not only records all user interaction with a digital device (e.g. smartphone), but also collects all available context data (such as from sensors in the digital device itself, a fitness band or a smart appliances). The data collected by BaranC is recorded as a User Digital Imprint (UDI) which is, in effect, the user model and provides the basis for data analysis. BaranC provides functionality that is useful for user experience studies, user interface design evaluation, and providing user assistance services. An important concern for personal data is privacy, and the framework gives the user full control over the monitoring, storing and sharing of their data.