7 resultados para System analysis - Data processing
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
This work considers the static calculation of a program’s average-case time. The number of systems that currently tackle this research problem is quite small due to the difficulties inherent in average-case analysis. While each of these systems make a pertinent contribution, and are individually discussed in this work, only one of them forms the basis of this research. That particular system is known as MOQA. The MOQA system consists of the MOQA language and the MOQA static analysis tool. Its technique for statically determining average-case behaviour centres on maintaining strict control over both the data structure type and the labeling distribution. This research develops and evaluates the MOQA language implementation, and adds to the functions already available in this language. Furthermore, the theory that backs MOQA is generalised and the range of data structures for which the MOQA static analysis tool can determine average-case behaviour is increased. Also, some of the MOQA applications and extensions suggested in other works are logically examined here. For example, the accuracy of classifying the MOQA language as reversible is investigated, along with the feasibility of incorporating duplicate labels into the MOQA theory. Finally, the analyses that take place during the course of this research reveal some of the MOQA strengths and weaknesses. This thesis aims to be pragmatic when evaluating the current MOQA theory, the advancements set forth in the following work and the benefits of MOQA when compared to similar systems. Succinctly, this work’s significant expansion of the MOQA theory is accompanied by a realistic assessment of MOQA’s accomplishments and a serious deliberation of the opportunities available to MOQA in the future.
Resumo:
The power consumption of wireless sensor networks (WSN) module is an important practical concern in building energy management (BEM) system deployments. A set of metrics are created to assess the power profiles of WSN in real world condition. The aim of this work is to understand and eventually eliminate the uncertainties in WSN power consumption during long term deployments and the compatibility with existing and emerging energy harvesting technologies. This paper investigates the key metrics in data processing, wireless data transmission, data sensing and duty cycle parameter to understand the system power profile from a practical deployment prospective. Based on the proposed analysis, the impacts of individual metric on power consumption in a typical BEM application are presented and the subsequent low power solutions are investigated.
Resumo:
Buried heat sources can be investigated by examining thermal infrared images and comparing these with the results of theoretical models which predict the thermal anomaly a given heat source may generate. Key factors influencing surface temperature include the geometry and temperature of the heat source, the surface meteorological environment, and the thermal conductivity and anisotropy of the rock. In general, a geothermal heat flux of greater than 2% of solar insolation is required to produce a detectable thermal anomaly in a thermal infrared image. A heat source of, for example, 2-300K greater than the average surface temperature must be a t depth shallower than 50m for the detection of the anomaly in a thermal infrared image, for typical terrestrial conditions. Atmospheric factors are of critical importance. While the mean atmospheric temperature has little significance, the convection is a dominant factor, and can act to swamp the thermal signature entirely. Given a steady state heat source that produces a detectable thermal anomaly, it is possible to loosely constrain the physical properties of the heat source and surrounding rock, using the surface thermal anomaly as a basis. The success of this technique is highly dependent on the degree to which the physical properties of the host rock are known. Important parameters include the surface thermal properties and thermal conductivity of the rock. Modelling of transient thermal situations was carried out, to assess the effect of time dependant thermal fluxes. One-dimensional finite element models can be readily and accurately applied to the investigation of diurnal heat flow, as with thermal inertia models. Diurnal thermal models of environments on Earth, the Moon and Mars were carried out using finite elements and found to be consistent with published measurements. The heat flow from an injection of hot lava into a near surface lava tube was considered. While this approach was useful for study, and long term monitoring in inhospitable areas, it was found to have little hazard warning utility, as the time taken for the thermal energy to propagate to the surface in dry rock (several months) in very long. The resolution of the thermal infrared imaging system is an important factor. Presently available satellite based systems such as Landsat (resolution of 120m) are inadequate for detailed study of geothermal anomalies. Airborne systems, such as TIMS (variable resolution of 3-6m) are much more useful for discriminating small buried heat sources. Planned improvements in the resolution of satellite based systems will broaden the potential for application of the techniques developed in this thesis. It is important to note, however, that adequate spatial resolution is a necessary but not sufficient condition for successful application of these techniques.
Resumo:
This paper describes implementations of two mobile cloud applications, file synchronisation and intensive data processing, using the Context Aware Mobile Cloud Services middleware, and the Cloud Personal Assistant. Both are part of the same mobile cloud project, actively developed and currently at the second version. We describe recent changes to the middleware, along with our experimental results of the two application models. We discuss challenges faced during the development of the middleware and their implications. The paper includes performance analysis of the CPA support for the two applications in respect to existing solutions.
Resumo:
Long reach passive optical networks (LR-PONs), which integrate fibre-to-the-home with metro networks, have been the subject of intensive research in recent years and are considered one of the most promising candidates for the next generation of optical access networks. Such systems ideally have reaches greater than 100km and bit rates of at least 10Gb/s per wavelength in the downstream and upstream directions. Due to the limited equipment sharing that is possible in access networks, the laser transmitters in the terminal units, which are usually the most expensive components, must be as cheap as possible. However, the requirement for low cost is generally incompatible with the need for a transmitter chirp characteristic that is optimised for such long reaches at 10Gb/s, and hence dispersion compensation is required. In this thesis electronic dispersion compensation (EDC) techniques are employed to increase the chromatic dispersion tolerance and to enhance the system performance at the expense of moderate additional implementation complexity. In order to use such EDC in LR-PON architectures, a number of challenges associated with the burst-mode nature of the upstream link need to be overcome. In particular, the EDC must be made adaptive from one burst to the next (burst-mode EDC, or BM-EDC) in time scales on the order of tens to hundreds of nanoseconds. Burst-mode operation of EDC has received little attention to date. The main objective of this thesis is to demonstrate the feasibility of such a concept and to identify the key BM-EDC design parameters required for applications in a 10Gb/s burst-mode link. This is achieved through a combination of simulations and transmission experiments utilising off-line data processing. The research shows that burst-to-burst adaptation can in principle be implemented efficiently, opening the possibility of low overhead, adaptive EDC-enabled burst-mode systems.
Resumo:
Background: The Human Papillomavirus (HPV) is one of the world’s most common sexually transmitted infections, and a causative factor of oropharyngeal, anal and penile cancers in males. Worldwide, an estimated 39,000 HPV-associated cancers occur each year in men. The highest rates of HPV infection are found in adults aged 18 to 28 years. Clinical evidence indicates that use of a condom in addition to obtaining the HPV vaccine provides the greatest protection from HPV infections. Aim: To explore young men’s attitudes, beliefs, and behavioural intention in relation to receiving the HPV vaccine and using a condom correctly and consistently. Collectively, both behaviours are linked to the prevention of HPV transmission and associated infections with HPV. Method: A multi- phase study, underpinned by the Theory of Planned Behaviour, involving a qualitative belief elicitation, pilot, and quantitative cross-sectional study was conducted. A belief elicitation (n=12) phase was used to generate items to include in a newly developed research instrument. Post pilot the research instrument was utilised in a cross sectional online survey to explore the attitudes, beliefs, and behavioural intention of young men (n= 359) with regard to receiving the HPV vaccine, and using a condom correctly and consistently. Data Collection: Data collection took place over a three month time frame. Male participants were recruited from a university in Southern Ireland via a student email system, as well as posting advertisements on numerous health, social and sports websites. Sample: Three hundred and fifty nine male participants aged 18- 28 years completed the online questionnaire. Data Analysis: Data were analysed using SPSS. Descriptive, correlational, multiple and hierarchical regression analysis were performed on the indirect and direct variables of the Theory of Planned Behaviour i.e. attitude, subjective norm, perceived behavioural control, and intention. Status variables were also included in descriptive analysis and hierarchical regressions. Findings are presented through text and graphical representation. Results: Alarming sexual health statistics identified that only 44.3% of participants always used a condom, and 78.6% never used a condom for oral sex. Furthermore, findings reveal that the constructs of the Theory of Planned Behaviour adequately measure male’s attitudes, beliefs and behavioural intention with regard to both behaviours. The Theory of Planned Behaviour has assisted in identifying how social pressures play an influential role in relation to males receiving the HPV vaccine. Attitudes presented as the most significant predictor of male’s intentions to use a condom correctly and consistently. Intention to perform both behaviours was identified as moderate to high. Conclusion: This study has contributed to the field of HPV research, as it is the first piece of research to explore preventative HPV behaviours i.e. receiving the HPV vaccine, and condom use, amongst young males, utilising the Theory of Planned Behaviour. A deeper understanding of young male’s attitudes, beliefs, and behavioural intention on this topic has been achieved. Additionally, a new robust research instrument has been constructed. Findings from this study will undoubtedly help in the implementation of the HPV vaccine in Ireland, as well as influence health promotion campaigns aimed at young males, addressing the topic of condom use.