38 resultados para random forest data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

JASMIN is a super-data-cluster designed to provide a high-performance high-volume data analysis environment for the UK environmental science community. Thus far JASMIN has been used primarily by the atmospheric science and earth observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment. This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Genetic polymorphisms of transcription factor 7-like 2 (TCF7L2) have been associated with type 2 diabetes and BMI. OBJECTIVE: The objective was to investigate whether TCF7L2 HapA is associated with weight development and whether such an association is modulated by protein intake or by the glycemic index (GI). DESIGN: The investigation was based on prospective data from 5 cohort studies nested within the European Prospective Investigation into Cancer and Nutrition. Weight change was followed up for a mean (±SD) of 6.8 ± 2.5 y. TCF7L2 rs7903146 and rs10885406 were successfully genotyped in 11,069 individuals and used to derive HapA. Multiple logistic and linear regression analysis was applied to test for the main effect of HapA and its interaction with dietary protein or GI. Analyses from the cohorts were combined by random-effects meta-analysis. RESULTS: HapA was associated neither with baseline BMI (0.03 ± 0.07 BMI units per allele; P = 0.6) nor with annual weight change (8.8 ± 11.7 g/y per allele; P = 0.5). However, a previously shown positive association between intake of protein, particularly of animal origin, and subsequent weight change in this population proved to be attenuated by TCF7L2 HapA (P-interaction = 0.01). We showed that weight gain becomes independent of protein intake with an increasing number of HapA alleles. Substitution of protein with either fat or carbohydrates showed the same effects. No interaction with GI was observed. CONCLUSION: TCF7L2 HapA attenuates the positive association between animal protein intake and long-term body weight change in middle-aged Europeans but does not interact with the GI of the diet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Streamwater nitrate dynamics in the River Hafren, Plynlimon, mid-Wales were investigated over decadal to sub-daily timescales using a range of statistical techniques. Long-term data were derived from weekly grab samples (1984–2010) and high-frequency data from 7-hourly samples (2007–2009) both measured at two sites: a headwater stream draining moorland and a downstream site below plantation forest. This study is one of the first to analyse upland streamwater nitrate dynamics across such a wide range of timescales and report on the principal mechanisms identified. The data analysis provided no clear evidence that the long-term decline in streamwater nitrate concentrations was related to a decline in atmospheric deposition alone, because nitrogen deposition first increased and then decreased during the study period. Increased streamwater temperature and denitrification may also have contributed to the decline in stream nitrate concentrations, the former through increased N uptake rates and the latter resultant from increased dissolved organic carbon concentrations. Strong seasonal cycles, with concentration minimums in the summer, were driven by seasonal flow minimums and seasonal biological activity enhancing nitrate uptake. Complex diurnal dynamics were observed, with seasonal changes in phase and amplitude of the cycling, and the diurnal dynamics were variable along the river. At the moorland site, a regular daily cycle, with minimum concentrations in the early afternoon, corresponding with peak air temperatures, indicated the importance of instream biological processing. At the downstream site, the diurnal dynamics were a composite signal, resultant from advection, dispersion and nitrate processing in the soils of the lower catchment. The diurnal streamwater nitrate dynamics were also affected by drought conditions. Enhanced diurnal cycling in Spring 2007 was attributed to increased nitrate availability in the post-drought period as well as low flow rates and high temperatures over this period. The combination of high-frequency short-term measurements and long-term monitoring provides a powerful tool for increasing understanding of the controls of element fluxes and concentrations in surface waters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have used the BIOME4 biogeography–biochemistry model and comparison with palaeovegetation data to evaluate the response of six ocean–atmosphere general circulation models to mid-Holocene changes in orbital forcing in the mid- to high-latitudes of the northern hemisphere. All the models produce: (a) a northward shift of the northern limit of boreal forest, in response to simulated summer warming in high-latitudes. The northward shift is markedly asymmetric, with larger shifts in Eurasia than in North America; (b) an expansion of xerophytic vegetation in mid-continental North America and Eurasia, in response to increased temperatures during the growing season; (c) a northward expansion of temperate forests in eastern North America, in response to simulated winter warming. The northward shift of the northern limit of boreal forest and the northward expansion of temperate forests in North America are supported by palaeovegetation data. The expansion of xerophytic vegetation in mid-continental North America is consistent with palaeodata, although the extent may be over-estimated. The simulated expansion of xerophytic vegetation in Eurasia is not supported by the data. Analysis of an asynchronous coupling of one model to an equilibrium-vegetation model suggests vegetation feedback exacerbates this mid-continental drying and produces conditions more unlike the observations. Not all features of the simulations are robust: some models produce winter warming over Europe while others produce winter cooling. As a result, some models show a northward shift of temperate forests (consistent with, though less marked than, the expansion shown by data) and others produce a reduction in temperate forests. Elucidation of the cause of such differences is a focus of the current phase of the Palaeoclimate Modelling Intercomparison Project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart healthcare is a complex domain for systems integration due to human and technical factors and heterogeneous data sources involved. As a part of smart city, it is such a complex area where clinical functions require smartness of multi-systems collaborations for effective communications among departments, and radiology is one of the areas highly relies on intelligent information integration and communication. Therefore, it faces many challenges regarding integration and its interoperability such as information collision, heterogeneous data sources, policy obstacles, and procedure mismanagement. The purpose of this study is to conduct an analysis of data, semantic, and pragmatic interoperability of systems integration in radiology department, and to develop a pragmatic interoperability framework for guiding the integration. We select an on-going project at a local hospital for undertaking our case study. The project is to achieve data sharing and interoperability among Radiology Information Systems (RIS), Electronic Patient Record (EPR), and Picture Archiving and Communication Systems (PACS). Qualitative data collection and analysis methods are used. The data sources consisted of documentation including publications and internal working papers, one year of non-participant observations and 37 interviews with radiologists, clinicians, directors of IT services, referring clinicians, radiographers, receptionists and secretary. We identified four primary phases of data analysis process for the case study: requirements and barriers identification, integration approach, interoperability measurements, and knowledge foundations. Each phase is discussed and supported by qualitative data. Through the analysis we also develop a pragmatic interoperability framework that summaries the empirical findings and proposes recommendations for guiding the integration in the radiology context.