973 resultados para correlated data
Resumo:
Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Resumo:
Substantial altimetry datasets collected by different satellites have only become available during the past five years, but the future will bring a variety of new altimetry missions, both parallel and consecutive in time. The characteristics of each produced dataset vary with the different orbital heights and inclinations of the spacecraft, as well as with the technical properties of the radar instrument. An integral analysis of datasets with different properties offers advantages both in terms of data quantity and data quality. This thesis is concerned with the development of the means for such integral analysis, in particular for dynamic solutions in which precise orbits for the satellites are computed simultaneously. The first half of the thesis discusses the theory and numerical implementation of dynamic multi-satellite altimetry analysis. The most important aspect of this analysis is the application of dual satellite altimetry crossover points as a bi-directional tracking data type in simultaneous orbit solutions. The central problem is that the spatial and temporal distributions of the crossovers are in conflict with the time-organised nature of traditional solution methods. Their application to the adjustment of the orbits of both satellites involved in a dual crossover therefore requires several fundamental changes of the classical least-squares prediction/correction methods. The second part of the thesis applies the developed numerical techniques to the problems of precise orbit computation and gravity field adjustment, using the altimetry datasets of ERS-1 and TOPEX/Poseidon. Although the two datasets can be considered less compatible that those of planned future satellite missions, the obtained results adequately illustrate the merits of a simultaneous solution technique. In particular, the geographically correlated orbit error is partially observable from a dataset consisting of crossover differences between two sufficiently different altimetry datasets, while being unobservable from the analysis of altimetry data of both satellites individually. This error signal, which has a substantial gravity-induced component, can be employed advantageously in simultaneous solutions for the two satellites in which also the harmonic coefficients of the gravity field model are estimated.
Resumo:
South Asians have a higher risk of type 2 diabetes mellitus (T2DM) and cardiovascular disease (CVD) than white Caucasians, for a given BMI. Premature biological ageing, assessed by reduction in telomere length (TL), may be mediated by factors resulting from altered metabolic profiles associated with obesity. We hypothesise that ethnicity and metabolic status represent detrimental factors contributing to premature biological ageing. Therefore we assessed TL in two South Asian, age and BMI-matched cohorts [T2DM (n = 142) versus non-T2DM (n = 76)] to determine the effects of BMI, gender, lipid and CVD profile on biological ageing. Genomic DNA was obtained from the UKADS cohort; biochemical and anthropometric data was collected and TL was measured by quantitative real-time PCR. Our findings indicated a gender-specific effect with reduced TL in T2DM men compared with non-T2DM men (P = 0.006). Additionally, in T2DM men, TL was inversely correlated with triglycerides and total cholesterol (r = -0.419, P <0.01; r = -0.443, P <0.01). In summary, TL was reduced amongst South Asian T2DM men and correlated with triglycerides and total cholesterol. This study highlights enhanced biological ageing among South Asian, T2DM men, which appears to be tracked by changes in lipids and BMI, suggesting that raised lipids and BMI may directly contribute to premature ageing.
Resumo:
Purpose - Measurements obtained from the right and left eye of a subject are often correlated whereas many statistical tests assume observations in a sample are independent. Hence, data collected from both eyes cannot be combined without taking this correlation into account. Current practice is reviewed with reference to articles published in three optometry journals, viz., Ophthalmic and Physiological Optics (OPO), Optometry and Vision Science (OVS), Clinical and Experimental Optometry (CEO) during the period 2009–2012. Recent findings - Of the 230 articles reviewed, 148/230 (64%) obtained data from one eye and 82/230 (36%) from both eyes. Of the 148 one-eye articles, the right eye, left eye, a randomly selected eye, the better eye, the worse or diseased eye, or the dominant eye were all used as selection criteria. Of the 82 two-eye articles, the analysis utilized data from: (1) one eye only rejecting data from the adjacent eye, (2) both eyes separately, (3) both eyes taking into account the correlation between eyes, or (4) both eyes using one eye as a treated or diseased eye, the other acting as a control. In a proportion of studies, data were combined from both eyes without correction. Summary - It is suggested that: (1) investigators should consider whether it is advantageous to collect data from both eyes, (2) if one eye is studied and both are eligible, then it should be chosen at random, and (3) two-eye data can be analysed incorporating eyes as a ‘within subjects’ factor.
Resumo:
We demonstrate a novel phase noise estimation scheme for CO-OFDM, in which pilot subcarriers are deliberately correlated to the data subcarriers. This technique reduces the overhead by a factor of 2. © OSA 2014.
Resumo:
Optimal design for parameter estimation in Gaussian process regression models with input-dependent noise is examined. The motivation stems from the area of computer experiments, where computationally demanding simulators are approximated using Gaussian process emulators to act as statistical surrogates. In the case of stochastic simulators, which produce a random output for a given set of model inputs, repeated evaluations are useful, supporting the use of replicate observations in the experimental design. The findings are also applicable to the wider context of experimental design for Gaussian process regression and kriging. Designs are proposed with the aim of minimising the variance of the Gaussian process parameter estimates. A heteroscedastic Gaussian process model is presented which allows for an experimental design technique based on an extension of Fisher information to heteroscedastic models. It is empirically shown that the error of the approximation of the parameter variance by the inverse of the Fisher information is reduced as the number of replicated points is increased. Through a series of simulation experiments on both synthetic data and a systems biology stochastic simulator, optimal designs with replicate observations are shown to outperform space-filling designs both with and without replicate observations. Guidance is provided on best practice for optimal experimental design for stochastic response models. © 2013 Elsevier Inc. All rights reserved.
Resumo:
2000 Mathematics Subject Classification: 62J12, 62K15, 91B42, 62H99.
Resumo:
Flow Cytometry analyzers have become trusted companions due to their ability to perform fast and accurate analyses of human blood. The aim of these analyses is to determine the possible existence of abnormalities in the blood that have been correlated with serious disease states, such as infectious mononucleosis, leukemia, and various cancers. Though these analyzers provide important feedback, it is always desired to improve the accuracy of the results. This is evidenced by the occurrences of misclassifications reported by some users of these devices. It is advantageous to provide a pattern interpretation framework that is able to provide better classification ability than is currently available. Toward this end, the purpose of this dissertation was to establish a feature extraction and pattern classification framework capable of providing improved accuracy for detecting specific hematological abnormalities in flow cytometric blood data. ^ This involved extracting a unique and powerful set of shift-invariant statistical features from the multi-dimensional flow cytometry data and then using these features as inputs to a pattern classification engine composed of an artificial neural network (ANN). The contribution of this method consisted of developing a descriptor matrix that can be used to reliably assess if a donor’s blood pattern exhibits a clinically abnormal level of variant lymphocytes, which are blood cells that are potentially indicative of disorders such as leukemia and infectious mononucleosis. ^ This study showed that the set of shift-and-rotation-invariant statistical features extracted from the eigensystem of the flow cytometric data pattern performs better than other commonly-used features in this type of disease detection, exhibiting an accuracy of 80.7%, a sensitivity of 72.3%, and a specificity of 89.2%. This performance represents a major improvement for this type of hematological classifier, which has historically been plagued by poor performance, with accuracies as low as 60% in some cases. This research ultimately shows that an improved feature space was developed that can deliver improved performance for the detection of variant lymphocytes in human blood, thus providing significant utility in the realm of suspect flagging algorithms for the detection of blood-related diseases.^
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Resumo:
Time-series of varve properties and geochemistry were established from varved sediments of Lake Woserin (north-eastern Germany) covering the recent period AD 2010-1923 and the Mid-Holocene time-window 6400-4950 varve years before present (vyr BP) using microfacies analyses, X-ray fluorescence scanning (µ-XRF), microscopic varve chronology and 14C dating. The microscopic varve chronology was compared to a macroscopic varve chronology for the same sediment interval. Calcite layer thickness during the recent period is significantly correlated to increases in local annual precipitation (r=0.46, p=0.03) and reduced air-pressure (r=-0.72, p<0.0001). Meteorologically consistent with enhanced precipitation at Lake Woserin, a composite 500 hPa anomaly map for years with >1 standard deviation calcite layer thickness depicts a negative wave train air-pressure anomaly centred over southern Europe, with north-eastern Germany at its northern frontal zone. Three centennial-scale intervals of thicker calcite layers around the Mid-Holocene periods 6200-5900, 5750-5400 and 5300-4950 vyr BP might reflect humid conditions favouring calcite precipitation through the transport of Ca2+ ions into Lake Woserin, synchronous to wetter conditions in Europe. Calcite layer thickness oscillations of about 88 and 208 years resemble the solar Gleissberg and Suess cycles suggesting that the recorded hydroclimate changes in north-eastern Germany are modified by solar influences on synoptic-scale atmospheric circulation. However, parts of the periods of thicker calcite layers around 5750-5400 and 5200 vyr BP also coincide to enhanced human catchment activity at Lake Woserin. Therefore, calcite precipitation during these time-windows might have further been favored by anthropogenic deforestation mobilizing Ca2+ ions and/or lake eutrophication.
Resumo:
We compare a compilation of 220 sediment core d13C data from the glacial Atlantic Ocean with three-dimensional ocean circulation simulations including a marine carbon cycle model. The carbon cycle model employs circulation fields which were derived from previous climate simulations. All sediment data have been thoroughly quality controlled, focusing on epibenthic foraminiferal species (such as Cibicidoides wuellerstorfi or Planulina ariminensis) to improve the comparability of model and sediment core carbon isotopes. The model captures the general d13C pattern indicated by present-day water column data and Late Holocene sediment cores but underestimates intermediate and deep water values in the South Atlantic. The best agreement with glacial reconstructions is obtained for a model scenario with an altered freshwater balance in the Southern Ocean that mimics enhanced northward sea ice export and melting away from the zone of sea ice production. This results in a shoaled and weakened North Atlantic Deep Water flow and intensified Antarctic Bottom Water export, hence confirming previous reconstructions from paleoproxy records. Moreover, the modeled abyssal ocean is very cold and very saline, which is in line with other proxy data evidence.
Resumo:
Head motion during a Positron Emission Tomography (PET) brain scan can considerably degrade image quality. External motion-tracking devices have proven successful in minimizing this effect, but the associated time, maintenance, and workflow changes inhibit their widespread clinical use. List-mode PET acquisition allows for the retroactive analysis of coincidence events on any time scale throughout a scan, and therefore potentially offers a data-driven motion detection and characterization technique. An algorithm was developed to parse list-mode data, divide the full acquisition into short scan intervals, and calculate the line-of-response (LOR) midpoint average for each interval. These LOR midpoint averages, known as “radioactivity centroids,” were presumed to represent the center of the radioactivity distribution in the scanner, and it was thought that changes in this metric over time would correspond to intra-scan motion.
Several scans were taken of the 3D Hoffman brain phantom on a GE Discovery IQ PET/CT scanner to test the ability of the radioactivity to indicate intra-scan motion. Each scan incrementally surveyed motion in a different degree of freedom (2 translational and 2 rotational). The radioactivity centroids calculated from these scans correlated linearly to phantom positions/orientations. Centroid measurements over 1-second intervals performed on scans with ~1mCi of activity in the center of the field of view had standard deviations of 0.026 cm in the x- and y-dimensions and 0.020 cm in the z-dimension, which demonstrates high precision and repeatability in this metric. Radioactivity centroids are thus shown to successfully represent discrete motions on the submillimeter scale. It is also shown that while the radioactivity centroid can precisely indicate the amount of motion during an acquisition, it fails to distinguish what type of motion occurred.
Resumo:
Flux of bulk components, carbonate- and silicate-bearing skeleton organisms, and the d15N-isotopic signal were investigated on a 1-year time-series sediment trap deployed at the pelagic NU mooring site (Namibia Upwelling, ca. 29°S, 13°E) in the central Benguela System. The flux of bulk components mostly shows bimodal seasonality with major peaks in austral summer and winter, and moderate to low export in austral fall and spring. The calcium carbonate fraction dominates the export of particulates throughout the year, followed by lithogenic and biogenic opal. Planktonic foraminifera and coccolithophorids are major components of the carbonate fraction, while diatoms clearly dominate the biogenic opal fraction. Bulk d15N isotopic composition of particulate matter is positively correlated with the total mass flux during summer and fall, while negatively correlated during winter and spring. Seasonal changes in the intensity of the main oceanographic processes affecting the NU site are inferred from variations in bulk component flux, and in the flux and diversity patterns of individual species or group of species. Influence from the Namaqua (Hondeklip) upwelling cell through offshore migration of chlorophyll filaments is stronger in summer, while the winter flux maximum seems to reflect mainly in situ production, with less influence from the coastal and shelf upwelling areas. On a yearly basis, dominant microorganisms correspond well with the flora and fauna of tropical/subtropical waters, with minor contribution of near-shore organisms. The simultaneous occurrence of species with different ecological affinities mirrors the fact that the mooring site was located in a transitional region with large hydrographic variability over short-time intervals.
Resumo:
The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.
Resumo:
Purpose: Identify predictors and normative data for quality of life (QOL) in a sample of Portuguese adults from general population Methods: A cross-sectional correlational study was undertaken with two hundred and fifty-five (N=255) individuals from Portuguese general population (mean age 43yrs, range 25-84yrs; 148 females, 107 males). Participants completed the European Portuguese version of the World Health Organization Quality of Life short-form instrument (WHOQOL-Bref) and the European Portuguese version of the Center for Epidemiologic Studies Depression Scale (CES-D). Demographic information was also collected. Results: Portuguese adults reported their QOL as good. The physical, psychological and environmental domains predicted 44% of the variance of QOL. The strongest predictor was the physical domain and the weakest was social relationships. Age, educational level, socioeconomic status and emotional status were significantly correlated with QOL and explained 25% of the variance of QOL. The strongest predictor of QOL was emotional status followed by education and age. QOL was significantly different according to: marital status; living place (mainland or islands); type of cohabitants; occupation; health. Conclusions: The sample of adults from general Portuguese population reported high levels of QOL. The life domain that better explained QOL was the physical domain. Among other variables, emotional status best predicted QOL. Further variables influenced overall QOL. These findings inform our understanding on adults from Portuguese general population QOL