6 resultados para Compositional data analysis-roots in geosciences
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
Unpasteurised milk and many cheeses contain a diverse microbiological population. These microorganisms play important roles in dairy foods and can, for example, contribute to the development of flavours and aromas, determine safety, cause spoilage or enhance the health of the consumer. It is thus important to understand thoroughly the microorganisms present in these food types. Traditional culture dependent and culture-independent methods have provided much detail regarding the microbial content of dairy foods. However, the development of next-generation DNA sequencing technologies has revolutionised our knowledge of complex microbial environments. Throughout this thesis we observe the benefits of applying these technologies to provide a detailed understanding of the bacterial content of dairy foods, including those present in milk pre- and post-pasteurisation, Irish farmhouse cheeses and commercially produced cheeses which encounter a discolouration defect, as well as to study genomic changes in microbes associated with dairy foods. Through the application of these state-of-the-art technologies we identified the presence of microorganisms not previously associated with dairy foods.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Aim: To evaluate the reported use of Data Monitoring Committees (DMCs), the frequency of interim analysis, pre-specified stopping rules and early trial termination in neonatal randomised controlled trials (RCTs). Methods: We reviewed neonatal RCTs published in four high impact general medical journals, specifically looking at safety issues including documented involvement of a DMC, stated interim analysis, stopping rules and early trial termination. We searched all journal issues over an 11-year period (2003-2013) and recorded predefined parameters on each item for RCTs meeting inclusion criteria. Results: Seventy neonatal trials were identified in four general medical journals: Lancet, New England Journal of Medicine (NEJM), British Medical Journal and Journal of American Medical Association (JAMA). 43 (61.4%) studies reported the presence of a DMC, 36 (51.4%) explicitly mentioned interim analysis; stopping rules were reported in 15 (21.4%) RCTs and 7 (10%) trials were terminated early. The NEJM most frequently reported these parameters compared to the other three journals reviewed. Conclusion: While the majority of neonatal RCTs report on DMC involvement and interim analysis there is still scope for improvement. Clear documentation of safety related issues should be a central component of reporting in neonatal trials involving newborn infants.
Resumo:
The epoc® blood analysis system (Epocal Inc., Ottawa, Ontario, Canada) is a newly developed in vitro diagnostic hand-held analyzer for testing whole blood samples at point-of-care, which provides blood gas, electrolytes, ionized calcium, glucose, lactate, and hematocrit/calculated hemoglobin rapidly. The analytical performance of the epoc® system was evaluated in a tertiary hospital, see related research article “Analytical evaluation of the epoc® point-of-care blood analysis system in cardiopulmonary bypass patients” [1]. Data presented are the linearity analysis for 9 parameters and the comparison study in 40 cardiopulmonary bypass patients on 3 epoc® meters, Instrumentation Laboratory GEM4000, Abbott iSTAT, Nova CCX, and Roche Accu-Chek Inform II and Performa glucose meters.
Resumo:
Systematic, high-quality observations of the atmosphere, oceans and terrestrial environments are required to improve understanding of climate characteristics and the consequences of climate change. The overall aim of this report is to carry out a comparative assessment of approaches taken to addressing the state of European observations systems and related data analysis by some leading actors in the field. This research reports on approaches to climate observations and analyses in Ireland, Switzerland, Germany, The Netherlands and Austria and explores options for a more coordinated approach to national responses to climate observations in Europe. The key aspects addressed are: an assessment of approaches to develop GCOS and provision of analysis of GCOS data; an evaluation of how these countries are reporting development of GCOS; highlighting best practice in advancing GCOS implementation including analysis of Essential Climate Variables (ECVs); a comparative summary of the differences and synergies in terms of the reporting of climate observations; an overview of relevant European initiatives and recommendations on how identified gaps might be addressed in the short to medium term.
Resumo:
It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain