942 resultados para Data Driven Modeling


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multiple outcomes data are commonly used to characterize treatment effects in medical research, for instance, multiple symptoms to characterize potential remission of a psychiatric disorder. Often either a global, i.e. symptom-invariant, treatment effect is evaluated. Such a treatment effect may over generalize the effect across the outcomes. On the other hand individual treatment effects, varying across all outcomes, are complicated to interpret, and their estimation may lose precision relative to a global summary. An effective compromise to summarize the treatment effect may be through patterns of the treatment effects, i.e. "differentiated effects." In this paper we propose a two-category model to differentiate treatment effects into two groups. A model fitting algorithm and simulation study are presented, and several methods are developed to analyze heterogeneity presenting in the treatment effects. The method is illustrated using an analysis of schizophrenia symptom data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Riparian zones are dynamic, transitional ecosystems between aquatic and terrestrial ecosystems with well defined vegetation and soil characteristics. Development of an all-encompassing definition for riparian ecotones, because of their high variability, is challenging. However, there are two primary factors that all riparian ecotones are dependent on: the watercourse and its associated floodplain. Previous approaches to riparian boundary delineation have utilized fixed width buffers, but this methodology has proven to be inadequate as it only takes the watercourse into consideration and ignores critical geomorphology, associated vegetation and soil characteristics. Our approach offers advantages over other previously used methods by utilizing: the geospatial modeling capabilities of ArcMap GIS; a better sampling technique along the water course that can distinguish the 50-year flood plain, which is the optimal hydrologic descriptor of riparian ecotones; the Soil Survey Database (SSURGO) and National Wetland Inventory (NWI) databases to distinguish contiguous areas beyond the 50-year plain; and land use/cover characteristics associated with the delineated riparian zones. The model utilizes spatial data readily available from Federal and State agencies and geospatial clearinghouses. An accuracy assessment was performed to assess the impact of varying the 50-year flood height, changing the DEM spatial resolution (1, 3, 5 and 10m), and positional inaccuracies with the National Hydrography Dataset (NHD) streams layer on the boundary placement of the delineated variable width riparian ecotones area. The result of this study is a robust and automated GIS based model attached to ESRI ArcMap software to delineate and classify variable-width riparian ecotones.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Over the past several decades, it has become apparent that anthropogenic activities have resulted in the large-scale enhancement of the levels of many trace gases throughout the troposphere. More recently, attention has been given to the transport pathway taken by these emissions as they are dispersed throughout the atmosphere. The transport pathway determines the physical characteristics of emissions plumes and therefore plays an important role in the chemical transformations that can occur downwind of source regions. For example, the production of ozone (O3) is strongly dependent upon the transport its precursors undergo. O3 can initially be formed within air masses while still over polluted source regions. These polluted air masses can experience continued O3 production or O3 destruction downwind, depending on the air mass's chemical and transport characteristics. At present, however, there are a number of uncertainties in the relationships between transport and O3 production in the North Atlantic lower free troposphere. The first phase of the study presented here used measurements made at the Pico Mountain observatory and model simulations to determine transport pathways for US emissions to the observatory. The Pico Mountain observatory was established in the summer of 2001 in order to address the need to understand the relationships between transport and O3 production. Measurements from the observatory were analyzed in conjunction with model simulations from the Lagrangian particle dispersion model (LPDM), FLEX-PART, in order to determine the transport pathway for events observed at the Pico Mountain observatory during July 2003. A total of 16 events were observed, 4 of which were analyzed in detail. The transport time for these 16 events varied from 4.5 to 7 days, while the transport altitudes over the ocean ranged from 2-8 km, but were typically less than 3 km. In three of the case studies, eastward advection and transport in a weak warm conveyor belt (WCB) airflow was responsible for the export of North American emissions into the FT, while transport in the FT was governed by easterly winds driven by the Azores/Bermuda High (ABH) and transient northerly lows. In the fourth case study, North American emissions were lofted to 6-8 km in a WCB before being entrained in the same cyclone's dry airstream and transported down to the observatory. The results of this study show that the lower marine FT may provide an important transport environment where O3 production may continue, in contrast to transport in the marine boundary layer, where O3 destruction is believed to dominate. The second phase of the study presented here focused on improving the analysis methods that are available with LPDMs. While LPDMs are popular and useful for the analysis of atmospheric trace gas measurements, identifying the transport pathway of emissions from their source to a receptor (the Pico Mountain observatory in our case) using the standard gridded model output, particularly during complex meteorological scenarios can be difficult can be difficult or impossible. The transport study in phase 1 was limited to only 1 month out of more than 3 years of available data and included only 4 case studies out of the 16 events specifically due to this confounding factor. The second phase of this study addressed this difficulty by presenting a method to clearly and easily identify the pathway taken by only those emissions that arrive at a receptor at a particular time, by combining the standard gridded output from forward (i.e., concentrations) and backward (i.e., residence time) LPDM simulations, greatly simplifying similar analyses. The ability of the method to successfully determine the source-to-receptor pathway, restoring this Lagrangian information that is lost when the data are gridded, is proven by comparing the pathway determined from this method with the particle trajectories from both the forward and backward models. A sample analysis is also presented, demonstrating that this method is more accurate and easier to use than existing methods using standard LPDM products. Finally, we discuss potential future work that would be possible by combining the backward LPDM simulation with gridded data from other sources (e.g., chemical transport models) to obtain a Lagrangian sampling of the air that will eventually arrive at a receptor.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Principal Component Analysis (PCA) is a popular method for dimension reduction that can be used in many fields including data compression, image processing, exploratory data analysis, etc. However, traditional PCA method has several drawbacks, since the traditional PCA method is not efficient for dealing with high dimensional data and cannot be effectively applied to compute accurate enough principal components when handling relatively large portion of missing data. In this report, we propose to use EM-PCA method for dimension reduction of power system measurement with missing data, and provide a comparative study of traditional PCA and EM-PCA methods. Our extensive experimental results show that EM-PCA method is more effective and more accurate for dimension reduction of power system measurement data than traditional PCA method when dealing with large portion of missing data set.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Volcán Pacaya is one of three currently active volcanoes in Guatemala. Volcanic activity originates from the local tectonic subduction of the Cocos plate beneath the Caribbean plate along the Pacific Guatemalan coast. Pacaya is characterized by generally strombolian type activity with occasional larger vulcanian type eruptions approximately every ten years. One particularly large eruption occurred on May 27, 2010. Using GPS data collected for approximately 8 years before this eruption and data from an additional three years of collection afterwards, surface movement covering the period of the eruption can be measured and used as a tool to help understand activity at the volcano. Initial positions were obtained from raw data using the Automatic Precise Positioning Service provided by the NASA Jet Propulsion Laboratory. Forward modeling of observed 3-D displacements for three time periods (before, covering and after the May 2010 eruption) revealed that a plausible source for deformation is related to a vertical dike or planar surface trending NNW-SSE through the cone. For three distinct time periods the best fitting models describe deformation of the volcano: 0.45 right lateral movement and 0.55 m tensile opening along the dike mentioned above from October 2001 through January 2009 (pre-eruption); 0.55 m left lateral slip along the dike mentioned above for the period from January 2009 and January 2011 (covering the eruption); -0.025 m dip slip along the dike for the period from January 2011 through March 2013 (post-eruption). In all bestfit models the dike is oriented with a 75° westward dip. These data have respective RMS misfit values of 5.49 cm, 12.38 cm and 6.90 cm for each modeled period. During the time period that includes the eruption the volcano most likely experienced a combination of slip and inflation below the edifice which created a large scar at the surface down the northern flank of the volcano. All models that a dipping dike may be experiencing a combination of inflation and oblique slip below the edifice which augments the possibility of a westward collapse in the future.