973 resultados para Model Fitting
Resumo:
Alternans of cardiac action potential duration (APD) is a well-known arrhythmogenic mechanism which results from dynamical instabilities. The propensity to alternans is classically investigated by examining APD restitution and by deriving APD restitution slopes as predictive markers. However, experiments have shown that such markers are not always accurate for the prediction of alternans. Using a mathematical ventricular cell model known to exhibit unstable dynamics of both membrane potential and Ca2+ cycling, we demonstrate that an accurate marker can be obtained by pacing at cycle lengths (CLs) varying randomly around a basic CL (BCL) and by evaluating the transfer function between the time series of CLs and APDs using an autoregressive-moving-average (ARMA) model. The first pole of this transfer function corresponds to the eigenvalue (λalt) of the dominant eigenmode of the cardiac system, which predicts that alternans occurs when λalt≤−1. For different BCLs, control values of λalt were obtained using eigenmode analysis and compared to the first pole of the transfer function estimated using ARMA model fitting in simulations of random pacing protocols. In all versions of the cell model, this pole provided an accurate estimation of λalt. Furthermore, during slow ramp decreases of BCL or simulated drug application, this approach predicted the onset of alternans by extrapolating the time course of the estimated λalt. In conclusion, stochastic pacing and ARMA model identification represents a novel approach to predict alternans without making any assumptions about its ionic mechanisms. It should therefore be applicable experimentally for any type of myocardial cell.
Resumo:
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
Resumo:
Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies conducted at specific household locations as well as 15 ambient monitoring sites in the city. The models allow for both flexible, nonlinear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic particles, with some recording only outdoor concentrations of black or elemental carbon, some recording indoor concentrations of black carbon, and others recording both indoor and outdoor concentrations of black carbon. A joint model for outdoor and indoor exposure that specifies a spatially varying latent variable provides greater spatial coverage in the area of interest. We propose a penalised spline formation of the model that relates to generalised kringing of the latent traffic pollution variable and leads to a natural Bayesian Markov Chain Monte Carlo algorithm for model fitting. We propose methods that allow us to control the degress of freedom of the smoother in a Bayesian framework. Finally, we present results from an analysis that applies the model to data from summer and winter separately
Resumo:
Multiple outcomes data are commonly used to characterize treatment effects in medical research, for instance, multiple symptoms to characterize potential remission of a psychiatric disorder. Often either a global, i.e. symptom-invariant, treatment effect is evaluated. Such a treatment effect may over generalize the effect across the outcomes. On the other hand individual treatment effects, varying across all outcomes, are complicated to interpret, and their estimation may lose precision relative to a global summary. An effective compromise to summarize the treatment effect may be through patterns of the treatment effects, i.e. "differentiated effects." In this paper we propose a two-category model to differentiate treatment effects into two groups. A model fitting algorithm and simulation study are presented, and several methods are developed to analyze heterogeneity presenting in the treatment effects. The method is illustrated using an analysis of schizophrenia symptom data.
Resumo:
Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the Obsessive Compulsive Disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for latent class analysis of multilevel data.
Resumo:
Many chronic human lung diseases have their origin in early childhood, yet most murine models used to study them utilize adult mice. An important component of the asthma phenotype is exaggerated airway responses, frequently modelled by methacholine (MCh) challenge. The present study was undertaken to characterize MCh responses in mice from 2 to 8 wk of age measuring absolute lung volume and volume-corrected respiratory mechanics as outcome variables. Female BALB/c mice aged 2, 3, 4, 6, and 8 wk were studied during cumulative intravenous MCh challenge. Following each MCh dose, absolute lung volume was measured plethysmographically at functional residual volume and during a slow inflation to 20-hPa transrespiratory pressure. Respiratory system impedance was measured continuously during the inflation maneuver and partitioned into airway and constant-phase parenchymal components by model fitting. Volume-corrected (specific) estimates of respiratory mechanics were calculated. Intravenous MCh challenge induced a predominantly airway response with no evidence of airway closure in any age group. No changes in functional residual volume were seen in mice of any age during the MCh challenge. The specific airway resistance MCh dose response curves did not show significant differences between the age groups. The results from the present study do not show systematic differences in MCh responsiveness in mice from 2 to 8 wk of age.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
Biochemical maturation of the brain can be studied noninvasively by (1)H magnetic resonance spectroscopy (MRS) in human infants. Detailed time courses of cerebral tissue contents are known for the most abundant metabolites only, and whether or not premature birth affects biochemical maturation of the brain is disputed. Hence, the last trimester of gestation was observed in infants born prematurely, and their cerebral metabolite contents at birth and at expected term were compared with those of fullterm infants. Successful quantitative short-TE (1)H MRS was performed in three cerebral locations in 21 infants in 28 sessions (gestational age 32-43 weeks). The spectra were analyzed with linear combination model fitting, considerably extending the range of observable metabolites to include acetate, alanine, aspartate, cholines, creatines, gamma-aminobutyrate, glucose, glutamine, glutamate, glutathione, glycine, lactate, myo-inositol, macromolecular contributions, N-acetylaspartate, N-acetylaspartylglutamate, o-phosphoethanolamine, scyllo-inositol, taurine, and threonine. Significant effects of age and location were found for many metabolites, including the previously observed neuronal maturation reflected by an increase in N-acetylaspartate. Absolute brain metabolite content in premature infants at term was not considerably different from that in fullterm infants, indicating that prematurity did not affect biochemical brain maturation substantially in the studied population, which did not include infants of extremely low birthweight.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
The factorial validity of the SF-36 was evaluated using confirmatory factor analysis (CFA) methods, structural equation modeling (SEM), and multigroup structural equation modeling (MSEM). First, the measurement and structural model of the hypothesized SF-36 was explicated. Second, the model was tested for the validity of a second-order factorial structure, upon evidence of model misfit, determined the best-fitting model, and tested the validity of the best-fitting model on a second random sample from the same population. Third, the best-fitting model was tested for invariance of the factorial structure across race, age, and educational subgroups using MSEM.^ The findings support the second-order factorial structure of the SF-36 as proposed by Ware and Sherbourne (1992). However, the results suggest that: (a) Mental Health and Physical Health covary; (b) general mental health cross-loads onto Physical Health; (c) general health perception loads onto Mental Health instead of Physical Health; (d) many of the error terms are correlated; and (e) the physical function scale is not reliable across these two samples. This hierarchical factor pattern was replicated across both samples of health care workers, suggesting that the post hoc model fitting was not data specific. Subgroup analysis suggests that the physical function scale is not reliable across the "age" or "education" subgroups and that the general mental health scale path from Mental Health is not reliable across the "white/nonwhite" or "education" subgroups.^ The importance of this study is in the use of SEM and MSEM in evaluating sample data from the use of the SF-36. These methods are uniquely suited to the analysis of latent variable structures and are widely used in other fields. The use of latent variable models for self reported outcome measures has become widespread, and should now be applied to medical outcomes research. Invariance testing is superior to mean scores or summary scores when evaluating differences between groups. From a practical, as well as, psychometric perspective, it seems imperative that construct validity research related to the SF-36 establish whether this same hierarchical structure and invariance holds for other populations.^ This project is presented as three articles to be submitted for publication. ^
Resumo:
Cramér Rao Lower Bounds (CRLB) have become the standard for expression of uncertainties in quantitative MR spectroscopy. If properly interpreted as a lower threshold of the error associated with model fitting, and if the limits of its estimation are respected, CRLB are certainly a very valuable tool to give an idea of minimal uncertainties in magnetic resonance spectroscopy (MRS), although other sources of error may be larger. Unfortunately, it has also become standard practice to use relative CRLB expressed as a percentage of the presently estimated area or concentration value as unsupervised exclusion criterion for bad quality spectra. It is shown that such quality filtering with widely used threshold levels of 20% to 50% CRLB readily causes bias in the estimated mean concentrations of cohort data, leading to wrong or missed statistical findings-and if applied rigorously-to the failure of using MRS as a clinical instrument to diagnose disease characterized by low levels of metabolites. Instead, absolute CRLB in comparison to those of the normal group or CRLB in relation to normal metabolite levels may be more useful as quality criteria. Magn Reson Med, 2015. © 2015 Wiley Periodicals, Inc.
Resumo:
Effects of conspecific neighbours on survival and growth of trees have been found to be related to species abundance. Both positive and negative relationships may explain observed abundance patterns. Surprisingly, it is rarely tested whether such relationships could be biased or even spurious due to transforming neighbourhood variables or influences of spatial aggregation, distance decay of neighbour effects and standardization of effect sizes. To investigate potential biases, communities of 20 identical species were simulated with log-series abundances but without species-specific interactions. No relationship of conspecific neighbour effects on survival or growth with species abundance was expected. Survival and growth of individuals was simulated in random and aggregated spatial patterns using no, linear, or squared distance decay of neighbour effects. Regression coefficients of statistical neighbourhood models were unbiased and unrelated to species abundance. However, variation in the number of conspecific neighbours was positively or negatively related to species abundance depending on transformations of neighbourhood variables, spatial pattern and distance decay. Consequently, effect sizes and standardized regression coefficients, often used in model fitting across large numbers of species, were also positively or negatively related to species abundance depending on transformation of neighbourhood variables, spatial pattern and distance decay. Tests using randomized tree positions and identities provide the best benchmarks by which to critically evaluate relationships of effect sizes or standardized regression coefficients with tree species abundance. This will better guard against potential misinterpretations.
Resumo:
This paper proposed an automated three-dimensional (3D) lumbar intervertebral disc (IVD) segmentation strategy from Magnetic Resonance Imaging (MRI) data. Starting from two user supplied landmarks, the geometrical parameters of all lumbar vertebral bodies and intervertebral discs are automatically extracted from a mid-sagittal slice using a graphical model based template matching approach. Based on the estimated two-dimensional (2D) geometrical parameters, a 3D variable-radius soft tube model of the lumbar spine column is built by model fitting to the 3D data volume. Taking the geometrical information from the 3D lumbar spine column as constraints and segmentation initialization, the disc segmentation is achieved by a multi-kernel diffeomorphic registration between a 3D template of the disc and the observed MRI data. Experiments on 15 patient data sets showed the robustness and the accuracy of the proposed algorithm.
Resumo:
We present a framework for fitting multiple random walks to animal movement paths consisting of ordered sets of step lengths and turning angles. Each step and turn is assigned to one of a number of random walks, each characteristic of a different behavioral state. Behavioral state assignments may be inferred purely from movement data or may include the habitat type in which the animals are located. Switching between different behavioral states may be modeled explicitly using a state transition matrix estimated directly from data, or switching probabilities may take into account the proximity of animals to landscape features. Model fitting is undertaken within a Bayesian framework using the WinBUGS software. These methods allow for identification of different movement states using several properties of observed paths and lead naturally to the formulation of movement models. Analysis of relocation data from elk released in east-central Ontario, Canada, suggests a biphasic movement behavior: elk are either in an "encamped" state in which step lengths are small and turning angles are high, or in an "exploratory" state, in which daily step lengths are several kilometers and turning angles are small. Animals encamp in open habitat (agricultural fields and opened forest), but the exploratory state is not associated with any particular habitat type.
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^