25 resultados para statistical techniques
Resumo:
This paper introduces the application of linear multivariate statistical techniques, including partial least squares (PLS), canonical correlation analysis (CCA) and reduced rank regression (RRR), into the area of Systems Biology. This new approach aims to extract the important proteins embedded in complex signal transduction pathway models.The analysis is performed on a model of intracellular signalling along the janus-associated kinases/signal transducers and transcription factors (JAK/STAT) and mitogen activated protein kinases (MAPK) signal transduction pathways in interleukin-6 (IL6) stimulated hepatocytes, which produce signal transducer and activator of transcription factor 3 (STAT3).A region of redundancy within the MAPK pathway that does not affect the STAT3 transcription was identified using CCA. This is the core finding of this analysis and cannot be obtained by inspecting the model by eye. In addition, RRR was found to isolate terms that do not significantly contribute to changes in protein concentrations, while the application of PLS does not provide such a detailed picture by virtue of its construction.This analysis has a similar objective to conventional model reduction techniques with the advantage of maintaining the meaning of the states prior to and after the reduction process. A significant model reduction is performed, with a marginal loss in accuracy, offering a more concise model while maintaining the main influencing factors on the STAT3 transcription.The findings offer a deeper understanding of the reaction terms involved, confirm the relevance of several proteins to the production of Acute Phase Proteins and complement existing findings regarding cross-talk between the two signalling pathways.
Resumo:
This paper analyses multivariate statistical techniques for identifying and isolating abnormal process behaviour. These techniques include contribution charts and variable reconstructions that relate to the application of principal component analysis (PCA). The analysis reveals firstly that contribution charts produce variable contributions which are linearly dependent and may lead to an incorrect diagnosis, if the number of principal components retained is close to the number of recorded process variables. The analysis secondly yields that variable reconstruction affects the geometry of the PCA decomposition. The paper further introduces an improved variable reconstruction method for identifying multiple sensor and process faults and for isolating their influence upon the recorded process variables. It is shown that this can accommodate the effect of reconstruction, i.e. changes in the covariance matrix of the sensor readings and correctly re-defining the PCA-based monitoring statistics and their confidence limits. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
Do public organizations with similar tasks or structures differ across states with respect to their autonomy and control? If so, why? By comparing the autonomy, control and internal management of state agencies, this book shows how New Public Management doctrines actually work in three small European states with different politico-administrative regimes. Using a unique set of similar survey data on 226 state agencies in Norway, Ireland and Flanders, this study explains differences in agency autonomy, control and management by referring to international isomorphic pressures, state-specific politico-administrative regimes and characteristics of agencies. Therefore, organization theory and neo-institutional schools are used to formulate four competing theoretical perspectives and hypotheses are tested through simple and advanced statistical techniques. By comparing practices between states and between types of agencies, this study substantially enhances scientific knowledge about why public organizations are granted autonomy, why they are controlled in specific ways, and how autonomy affects internal management.
Resumo:
The spatial distributions of marine fauna and of pollution are both highly structured, and thus the resulting high levels of autocorrelation may invalidate conclusions based on classical statistical approaches. Here we analyse the close correlation observed between proxies for the disturbance associated with gas extraction activities and amphipod distribution patterns around four hydrocarbon platforms. We quantified the amount of variation independently accounted for by natural environmental variables, proxies for the disturbance caused by platforms, and spatial autocorrelation. This allowed us to demonstrate how each of these three factors significantly affects the community structure of amphipods. Sophisticated statistical techniques are required when taking into account spatial autocorrelation: nevertheless our data demonstrate that this approach not only enables the formulation of robust statistical inferences but also provides a much deeper understanding of the subtle interactions between human disturbance and natural factors affecting the structure of marine invertebrates communities. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Although exogenous factors such as pollutants can act on endogenous drivers (e.g. dispersion) of populations and create spatially autocorrelated distributions, most statistical techniques assume independence of error terms. As there are no studies on metal soil pollutants and microarthropods that explicitly analyse this key issue, we completed a field study of the correlation between Oribatida and metal concentrations in litter, organic matter and soil in an attempt to account for spatial patterns of both metals and mites. The 50-m wide study area had homogenous macroscopic features, steep Pb and Cu gradients and high levels of Zn and Cd. Spatial models failed to detect metal-oribatid relationships because the observed latitudinal and longitudinal gradients in oribatid assemblages were independent of the collinear gradients in the concentration of metals. It is therefore hypothesised that other spatially variable factors (e.g. fungi, reduced macrofauna) affect oribatid assemblages, which may be influenced by metals only indirectly. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Emotion research has long been dominated by the “standard method” of displaying posed or acted static images of facial expressions of emotion. While this method has been useful it is unable to investigate the dynamic nature of emotion expression. Although continuous self-report traces have enabled the measurement of dynamic expressions of emotion, a consensus has not been reached on the correct statistical techniques that permit inferences to be made with such measures. We propose Generalized Additive Models and Generalized Additive Mixed Models as techniques that can account for the dynamic nature of such continuous measures. These models allow us to hold constant shared components of responses that are due to perceived emotion across time, while enabling inference concerning linear differences between groups. The mixed model GAMM approach is preferred as it can account for autocorrelation in time series data and allows emotion decoding participants to be modelled as random effects. To increase confidence in linear differences we assess the methods that address interactions between categorical variables and dynamic changes over time. In addition we provide comments on the use of Generalized Additive Models to assess the effect size of shared perceived emotion and discuss sample sizes. Finally we address additional uses, the inference of feature detection, continuous variable interactions, and measurement of ambiguity.
Resumo:
Background: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take >2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping.
Results: cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance.
Conclusion: Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
Resumo:
In recent years distillers dried grains and solubles (DDGS), co-products of the bio-ethanol and beverages industries, have become globally traded commodity for the animal feed sector. As such it is becoming increasingly important to be able to trace the geographical origin of commodities in case of a contamination incident or authenticity issue arise. In this study, 137 DDGS samples from a range of different geographical origins (China, USA, Canada and European Union) were collected and analyzed. Isotope ratio mass spectrometry (IRMS) was used to analyze the DDGS for 2H/1H, 13C/12C, 15N/14N, 18O/16O and 34S/32S isotope ratios which can vary depending on geographical origin and processing. Univariate and multivariate statistical techniques were employed to investigate the feasibility of using the IRMS data to determine botanical and geographical origin of the DDGS. The results indicated that this commodity could be differentiated according to their place of origin by the analysis of stable isotopes of hydrogen, carbon, nitrogen and oxygen but not with sulfur. By adding data to the models produced in this study, potentially an isotope databank could be set up for traceability procedures for DDGS, similar to the one established already for wine which will help in feed and food security issues arising worldwide.
Resumo:
Mortality models used for forecasting are predominantly based on the statistical properties of time series and do not generally incorporate an understanding of the forces driving secular trends. This paper addresses three research questions: Can the factors found in stochastic mortality-forecasting models be associated with real-world trends in health-related variables? Does inclusion of health-related factors in models improve forecasts? Do resulting models give better forecasts than existing stochastic mortality models? We consider whether the space spanned by the latent factor structure in mortality data can be adequately described by developments in gross domestic product, health expenditure and lifestyle-related risk factors using statistical techniques developed in macroeconomics and finance. These covariates are then shown to improve forecasts when incorporated into a Bayesian hierarchical model. Results are comparable or better than benchmark stochastic mortality models.
Resumo:
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.
Resumo:
Farmed fish are typically genetically different from wild conspecifics. Escapees from fish farms may contribute one-way gene flow from farm to wild gene pools, which can depress population productivity, dilute local adaptations and disrupt coadapted gene complexes. Here, we reanalyse data from two experiments (McGinnity et al., 1997, 2003) where performance of Atlantic salmon (Salmo salar) progeny originating from experimental crosses between farm and wild parents (in three different cohorts) were measured in a natural stream under common garden conditions. Previous published analyses focussed on group-level differences but did not account for pedigree structure, as we do here using modern mixed-effect models. Offspring with one or two farm parents exhibited poorer survival in their first and second year of life compared with those with two wild parents and these group-level inferences were robust to excluding outlier families. Variation in performance among farm, hybrid and wild families was generally similar in magnitude. Farm offspring were generally larger at all life stages examined than wild offspring, but the differences were moderate (5–20%) and similar in magnitude in the wild versus hatchery environments. Quantitative genetic analyses conducted using a Bayesian framework revealed moderate heritability in juvenile fork length and mass and positive genetic correlations (>0.85) between these morphological traits. Our study confirms (using more rigorous statistical techniques) previous studies showing that offspring of wild fish invariably have higher fitness and contributes fresh insights into family-level variation in performance of farm, wild and hybrid Atlantic salmon families in the wild. It also adds to a small, but growing, number of studies that estimate key evolutionary parameters in wild salmonid populations. Such information is vital in modelling the impacts of introgression by escaped farm salmon.
Resumo:
The North Atlantic has played a key role in abrupt climate changes due to the sensitivity of the Atlantic Meridional Overturning Circulation (AMOC) to the location and strength of deep water formation. It is crucial for modelling future climate change to understand the role of the AMOC in the rapid warming and gradual cooling cycles known as Dansgaard-Oescher (DO) events which are recorded in the Greenland ice cores. However, palaeoceanographic research into DO events has been hampered by the uncertainty in timing due largely to the lack of a precise chronological time frame for marine records. While tephrochronology provides links to the Greenland ice core records at a few points, radiocarbon remains the primary dating method for most marine cores. Due to variations in the atmospheric and oceanic 14C concentration, radiocarbon ages must be calibrated to provide calendric ages. The IntCal Working Group provides a global estimate of ocean 14C ages for calibration of marine radiocarbon dates, but the variability of the surface marine reservoir age in the North Atlantic particularly during Heinrich or DO events, makes calibration uncertain. In addition, the current Marine09 radiocarbon calibration beyond around 15 ka BP is largely based on 'tuning' to the Hulu Cave isotope record, so that the timing of events may not be entirely synchronous with the Greenland ice cores. The use of event-stratigraphy and independent chronological markers such as tephra provide the scope to improve marine radiocarbon reservoir age estimates particularly in the North Atlantic where a number of tephra horizons have been identified in both marine sediments and the Greenland ice cores. Quantification of timescale uncertainties is critical but statistical techniques which can take into account the differential dating between events can improve the precision. Such techniques should make it possible to develop specific marine calibration curves for selected regions.
Resumo:
OBJECTIVES: Identify the words and phrases that authors used to describe time-to-event outcomes of dental treatments in patients.
MATERIALS AND METHODS: A systematic handsearch of 50 dental journals with the highest Citation Index for 2008 identified articles reporting dental treatment with time-to-event statistics (included "case" articles, n = 95), without time-to-event statistics (active "control" articles, n = 91), and all other articles (passive "control" articles n = 6796). The included and active controls were read, identifying 43 English words across the title, aim and abstract, indicating that outcomes were studied over time. Once identified, these words were sought within the 6796 passive controls. Words were divided into six groups. Differences in use of words were analyzed with Pearson's chi-square across these six groups, and the three locations (title, aim, and abstract).
RESULTS: In the abstracts, included articles used group 1 (statistical technique) and group 2 (statistical terms) more frequently than the active and passive controls (group 1: 35%, 2%, 0.37%, P < 0.001 and group 2: 31%, 1%, 0.06%, P < 0.001). The included and active controls used group 3 (quasi-statistical) equally, but significantly more often than the passive controls (82%, 78%, 3.21%, P < 0.001). In the aims, use of target words was similar for included and active controls, but less frequent for groups 1-4 in the passive controls (P < 0.001). In the title, group 2 (statistical techniques) and groups 3-5 (outcomes) were similar for included and active controls, but groups 2 and 3 were less frequent in the passive controls (P < 0.001). Significantly more included articles used group 6 words (stating the study duration) (54%, 30%, P = 0.001).
CONCLUSION: All included articles used time-to-event analyses, but two-thirds did not include words to highlight this in the abstract. There is great variation in the words authors used to describe dental time-to-event outcomes. Electronic identification of such articles would be inconsistent, with low sensitivity and specificity. Authors should improve the reporting quality. Journals should allow sufficient space in abstracts to summarize research, and not impose unrealistic word limits. Readers should be mindful of these problems when searching for relevant articles. Additional research is required in this field.
Resumo:
This study applies spatial statistical techniques including cokriging to integrate airborne geophysical (radiometric) data with ground-based measurements of peat depth and soil organic carbon (SOC) to monitor change in peat cover for carbon stock calculations. The research is part of the EU funded Tellus Border project and is supported by the INTERREG IVA development programme of the European Regional Development Fund, which is managed by the Special EU Programmes Body (SEUPB). The premise is that saturated peat attenuates the radiometric signal from underlying soils and rocks. Contemporaneous ground-based measurements were collected to corroborate mapped estimates and develop a statistical model for volumetric carbon content (VCC) to 0.5 metres. Field measurements included ground penetrating radar, gamma ray spectrometry and a soil sampling methodology which measured bulk density and soil moisture to determine VCC. One aim of the study was to explore whether airborne radiometric survey data can be used to establish VCC across a region. To account for the footprint of airborne radiometric data, five cores were obtained at each soil sampling location: one at the centre of the ground radiometric equivalent sample location and one at each of the four corners 20 metres apart. This soil sampling strategy replicated the methodology deployed for the Tellus Border geochemistry survey. Two key issues will be discussed from this work. The first addresses the integration of different sampling supports for airborne and ground measured data and the second discusses the compositional nature of the VOC data.
Resumo:
The techniques of principal component analysis (PCA) and partial least squares (PLS) are introduced from the point of view of providing a multivariate statistical method for modelling process plants. The advantages and limitations of PCA and PLS are discussed from the perspective of the type of data and problems that might be encountered in this application area. These concepts are exemplified by two case studies dealing first with data from a continuous stirred tank reactor (CSTR) simulation and second a literature source describing a low-density polyethylene (LDPE) reactor simulation.