5 resultados para FUNCTIONAL DATA ANALYSIS

em CORA - Cork Open Research Archive - University College Cork - Ireland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability to adapt to and respond to increases in external osmolarity is an important characteristic that enables bacteria to survive and proliferate in different environmental niches. When challenged with increased osmolarity, due to sodium chloride (NaCl) for example, bacteria elicit a phased response; firstly via uptake of potassium (K+), which is known as the primary response. This primary response is followed by the secondary response which is characterised by the synthesis or uptake of compatible solutes (osmoprotectants). The overall osmotic stress response is much broader however, involving many diverse cellular systems and processes. These ancillary mechanisms are arguably more interesting and give a more complete view of the osmotic stress response. The aim of this thesis was to identify novel genetic loci from the human gut microbiota that confer increased tolerance to osmotic stress using a functional metagenomic approach. Functional metagenomics is a powerful tool that enables the identification of novel genes from as yet uncultured bacteria from diverse environments through cloning, heterologous expression and phenotypic identification of a desired trait. Functional metagenomics does not rely on any previous sequence information to known genes and can therefore enable the discovery of completely novel genes and assign functions to new or known genes. Using a functional metagenomic approach, we have assigned a novel function to previously annotated genes; murB, mazG and galE, as well as a putative brp/blh family beta-carotene 15,15’-monooxygenase. Finally, we report the identification of a completely novel salt tolerance determinant with no current known homologues in the databases. Overall the genes identified originate from diverse taxonomic and phylogenetic groups commonly found in the human gastrointestinal (GI) tract, such as Collinsella and Eggerthella, Akkermansia and Bacteroides from the phyla Actinobacteria, Verrucomicrobia and Bacteroidetes, respectively. In addition, a number of the genes appear to have been acquired via lateral gene transfer and/or encoded on a prophage. To our knowledge, this thesis represents the first investigation to identify novel genes from the human gut microbiota involved in the bacterial osmotic stress response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Alzheimer’s Disease and other dementias are one of the most challenging illnesses confronting countries with ageing populations. Treatment options for dementia are limited, and the costs are significant. There is a growing need to develop new treatments for dementia, especially for the elderly. There is also growing evidence that centrally acting angiotensin converting enzyme (ACE) inhibitors, which cross the blood-brain barrier, are associated with a reduced rate of cognitive and functional decline in dementia, especially in Alzheimer’s disease (AD). The aim of this research is to investigate the effects of centrally acting ACE inhibitors (CACE-Is) on the rate of cognitive and functional decline in dementia, using a three phased KDD process. KDD, as a scientific way to process and analysis clinical data, is used to find useful insights from a variety of clinical databases. The data used are from three clinic databases: Geriatric Assessment Tool (GAT), the Doxycycline and Rifampin for Alzheimer’s Disease (DARAD), and the Qmci validation databases, which were derived from several different geriatric clinics in Canada. This research involves patients diagnosed with AD, vascular or mixed dementia only. Patients were included if baseline and end-point (at least six months apart) Standardised Mini-Mental State Examination (SMMSE), Quick Mild Cognitive Impairment (Qmci) or Activities Daily Living (ADL) scores were available. Basically, the rates of change are compared between patients taking CACE-Is, and those not currently treated with CACE-Is. The results suggest that there is a statistically significant difference in the rate of decline in cognitive and functional scores between CACE-I and NoCACE-I patients. This research also validates that the Qmci, a new short assessment test, has potential to replace the current popular screening tests for cognition in the clinic and clinical trials.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Systematic, high-quality observations of the atmosphere, oceans and terrestrial environments are required to improve understanding of climate characteristics and the consequences of climate change. The overall aim of this report is to carry out a comparative assessment of approaches taken to addressing the state of European observations systems and related data analysis by some leading actors in the field. This research reports on approaches to climate observations and analyses in Ireland, Switzerland, Germany, The Netherlands and Austria and explores options for a more coordinated approach to national responses to climate observations in Europe. The key aspects addressed are: an assessment of approaches to develop GCOS and provision of analysis of GCOS data; an evaluation of how these countries are reporting development of GCOS; highlighting best practice in advancing GCOS implementation including analysis of Essential Climate Variables (ECVs); a comparative summary of the differences and synergies in terms of the reporting of climate observations; an overview of relevant European initiatives and recommendations on how identified gaps might be addressed in the short to medium term.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Energy efficiency and user comfort have recently become priorities in the Facility Management (FM) sector. This has resulted in the use of innovative building components, such as thermal solar panels, heat pumps, etc., as they have potential to provide better performance, energy savings and increased user comfort. However, as the complexity of components increases, the requirement for maintenance management also increases. The standard routine for building maintenance is inspection which results in repairs or replacement when a fault is found. This routine leads to unnecessary inspections which have a cost with respect to downtime of a component and work hours. This research proposes an alternative routine: performing building maintenance at the point in time when the component is degrading and requires maintenance, thus reducing the frequency of unnecessary inspections. This thesis demonstrates that statistical techniques can be used as part of a maintenance management methodology to invoke maintenance before failure occurs. The proposed FM process is presented through a scenario utilising current Building Information Modelling (BIM) technology and innovative contractual and organisational models. This FM scenario supports a Degradation based Maintenance (DbM) scheduling methodology, implemented using two statistical techniques, Particle Filters (PFs) and Gaussian Processes (GPs). DbM consists of extracting and tracking a degradation metric for a component. Limits for the degradation metric are identified based on one of a number of proposed processes. These processes determine the limits based on the maturity of the historical information available. DbM is implemented for three case study components: a heat exchanger; a heat pump; and a set of bearings. The identified degradation points for each case study, from a PF, a GP and a hybrid (PF and GP combined) DbM implementation are assessed against known degradation points. The GP implementations are successful for all components. For the PF implementations, the results presented in this thesis find that the extracted metrics and limits identify degradation occurrences accurately for components which are in continuous operation. For components which have seasonal operational periods, the PF may wrongly identify degradation. The GP performs more robustly than the PF, but the PF, on average, results in fewer false positives. The hybrid implementations, which are a combination of GP and PF results, are successful for 2 of 3 case studies and are not affected by seasonal data. Overall, DbM is effectively applied for the three case study components. The accuracy of the implementations is dependant on the relationships modelled by the PF and GP, and on the type and quantity of data available. This novel maintenance process can improve equipment performance and reduce energy wastage from BSCs operation.