945 resultados para Data modeling
Resumo:
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
Resumo:
Multiple outcomes data are commonly used to characterize treatment effects in medical research, for instance, multiple symptoms to characterize potential remission of a psychiatric disorder. Often either a global, i.e. symptom-invariant, treatment effect is evaluated. Such a treatment effect may over generalize the effect across the outcomes. On the other hand individual treatment effects, varying across all outcomes, are complicated to interpret, and their estimation may lose precision relative to a global summary. An effective compromise to summarize the treatment effect may be through patterns of the treatment effects, i.e. "differentiated effects." In this paper we propose a two-category model to differentiate treatment effects into two groups. A model fitting algorithm and simulation study are presented, and several methods are developed to analyze heterogeneity presenting in the treatment effects. The method is illustrated using an analysis of schizophrenia symptom data.
Resumo:
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural MRI.
Resumo:
Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.
Resumo:
Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.
Resumo:
Riparian zones are dynamic, transitional ecosystems between aquatic and terrestrial ecosystems with well defined vegetation and soil characteristics. Development of an all-encompassing definition for riparian ecotones, because of their high variability, is challenging. However, there are two primary factors that all riparian ecotones are dependent on: the watercourse and its associated floodplain. Previous approaches to riparian boundary delineation have utilized fixed width buffers, but this methodology has proven to be inadequate as it only takes the watercourse into consideration and ignores critical geomorphology, associated vegetation and soil characteristics. Our approach offers advantages over other previously used methods by utilizing: the geospatial modeling capabilities of ArcMap GIS; a better sampling technique along the water course that can distinguish the 50-year flood plain, which is the optimal hydrologic descriptor of riparian ecotones; the Soil Survey Database (SSURGO) and National Wetland Inventory (NWI) databases to distinguish contiguous areas beyond the 50-year plain; and land use/cover characteristics associated with the delineated riparian zones. The model utilizes spatial data readily available from Federal and State agencies and geospatial clearinghouses. An accuracy assessment was performed to assess the impact of varying the 50-year flood height, changing the DEM spatial resolution (1, 3, 5 and 10m), and positional inaccuracies with the National Hydrography Dataset (NHD) streams layer on the boundary placement of the delineated variable width riparian ecotones area. The result of this study is a robust and automated GIS based model attached to ESRI ArcMap software to delineate and classify variable-width riparian ecotones.
DIMENSION REDUCTION FOR POWER SYSTEM MODELING USING PCA METHODS CONSIDERING INCOMPLETE DATA READINGS
Resumo:
Principal Component Analysis (PCA) is a popular method for dimension reduction that can be used in many fields including data compression, image processing, exploratory data analysis, etc. However, traditional PCA method has several drawbacks, since the traditional PCA method is not efficient for dealing with high dimensional data and cannot be effectively applied to compute accurate enough principal components when handling relatively large portion of missing data. In this report, we propose to use EM-PCA method for dimension reduction of power system measurement with missing data, and provide a comparative study of traditional PCA and EM-PCA methods. Our extensive experimental results show that EM-PCA method is more effective and more accurate for dimension reduction of power system measurement data than traditional PCA method when dealing with large portion of missing data set.
Resumo:
Volcán Pacaya is one of three currently active volcanoes in Guatemala. Volcanic activity originates from the local tectonic subduction of the Cocos plate beneath the Caribbean plate along the Pacific Guatemalan coast. Pacaya is characterized by generally strombolian type activity with occasional larger vulcanian type eruptions approximately every ten years. One particularly large eruption occurred on May 27, 2010. Using GPS data collected for approximately 8 years before this eruption and data from an additional three years of collection afterwards, surface movement covering the period of the eruption can be measured and used as a tool to help understand activity at the volcano. Initial positions were obtained from raw data using the Automatic Precise Positioning Service provided by the NASA Jet Propulsion Laboratory. Forward modeling of observed 3-D displacements for three time periods (before, covering and after the May 2010 eruption) revealed that a plausible source for deformation is related to a vertical dike or planar surface trending NNW-SSE through the cone. For three distinct time periods the best fitting models describe deformation of the volcano: 0.45 right lateral movement and 0.55 m tensile opening along the dike mentioned above from October 2001 through January 2009 (pre-eruption); 0.55 m left lateral slip along the dike mentioned above for the period from January 2009 and January 2011 (covering the eruption); -0.025 m dip slip along the dike for the period from January 2011 through March 2013 (post-eruption). In all bestfit models the dike is oriented with a 75° westward dip. These data have respective RMS misfit values of 5.49 cm, 12.38 cm and 6.90 cm for each modeled period. During the time period that includes the eruption the volcano most likely experienced a combination of slip and inflation below the edifice which created a large scar at the surface down the northern flank of the volcano. All models that a dipping dike may be experiencing a combination of inflation and oblique slip below the edifice which augments the possibility of a westward collapse in the future.
Resumo:
When observers are presented with two visual targets appearing in the same position in close temporal proximity, a marked reduction in detection performance of the second target has often been reported, the so-called attentional blink phenomenon. Several studies found a similar decrement of P300 amplitudes during the attentional blink period as observed with detection performances of the second target. However, whether the parallel courses of second target performances and corresponding P300 amplitudes resulted from the same underlying mechanisms remained unclear. The aim of our study was therefore to investigate whether the mechanisms underlying the AB can be assessed by fixed-links modeling and whether this kind of assessment would reveal the same or at least related processes in the behavioral and electrophysiological data. On both levels of observation three highly similar processes could be identified: an increasing, a decreasing and a u-shaped trend. Corresponding processes from the behavioral and electrophysiological data were substantially correlated, with the two u-shaped trends showing the strongest association with each other. Our results provide evidence for the assumption that the same mechanisms underlie attentional blink task performance at the electrophysiological and behavioral levels as assessed by fixed-links models.
Volcanic forcing for climate modeling: a new microphysics-based data set covering years 1600–present
Resumo:
As the understanding and representation of the impacts of volcanic eruptions on climate have improved in the last decades, uncertainties in the stratospheric aerosol forcing from large eruptions are now linked not only to visible optical depth estimates on a global scale but also to details on the size, latitude and altitude distributions of the stratospheric aerosols. Based on our understanding of these uncertainties, we propose a new model-based approach to generating a volcanic forcing for general circulation model (GCM) and chemistry–climate model (CCM) simulations. This new volcanic forcing, covering the 1600–present period, uses an aerosol microphysical model to provide a realistic, physically consistent treatment of the stratospheric sulfate aerosols. Twenty-six eruptions were modeled individually using the latest available ice cores aerosol mass estimates and historical data on the latitude and date of eruptions. The evolution of aerosol spatial and size distribution after the sulfur dioxide discharge are hence characterized for each volcanic eruption. Large variations are seen in hemispheric partitioning and size distributions in relation to location/date of eruptions and injected SO2 masses. Results for recent eruptions show reasonable agreement with observations. By providing these new estimates of spatial distributions of shortwave and long-wave radiative perturbations, this volcanic forcing may help to better constrain the climate model responses to volcanic eruptions in the 1600–present period. The final data set consists of 3-D values (with constant longitude) of spectrally resolved extinction coefficients, single scattering albedos and asymmetry factors calculated for different wavelength bands upon request. Surface area densities for heterogeneous chemistry are also provided.