901 resultados para QUANTILE REGRESSION


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For hepatic schistosomiasis the egg-induced granulomatous response and the development of extensive fibrosis are the main pathologies. We used a Schistosoma japonicum-infected mouse model to characterise the multi-cellular pathways associated with the recovery from hepatic fibrosis following clearance of the infection with the anti-schistosomal drug, praziquantel. In the recovering liver splenomegaly, granuloma density and liver fibrosis were all reduced. Inflammatory cell infiltration into the liver was evident, and the numbers of neutrophils, eosinophils and macrophages were significantly decreased. Transcriptomic analysis revealed the up-regulation of fatty acid metabolism genes and the identification of Peroxisome proliferator activated receptor alpha as the upstream regulator of liver recovery. The aryl hydrocarbon receptor signalling pathway which regulates xenobiotic metabolism was also differentially up-regulated. These findings provide a better understanding of the mechanisms associated with the regression of hepatic schistosomiasis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses areas for future research opportunities by addressing accounting issues faced by management accountants practicing in hospitality organizations. Specifically, the article focuses on the use of the uniform system of accounts by operating properties, the usefulness of allocating support costs to operated departments, extending our understanding of operating costs and performance measurement systems and the certification of practicing accountants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We evaluate the integration of 3D preoperative computed tomography angiography of the coronary arteries with intraoperative 2D X-ray angiographies by a recently proposed novel registration-by-regression method. The method relates image features of 2D projection images to the transformation parameters of the 3D image. We compared different sets of features and studied the influence of preprocessing the training set. For the registration evaluation, a gold standard was developed from eight X-ray angiography sequences from six different patients. The alignment quality was measured using the 3D mean target registration error (mTRE). The registration-by-regression method achieved moderate accuracy (median mTRE of 15 mm) on real images. It does therefore not provide yet a complete solution to the 3D–2D registration problem but it could be used as an initialisation method to eliminate the need for manual initialisation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current practice for analysing functional neuroimaging data is to average the brain signals recorded at multiple sensors or channels on the scalp over time across hundreds of trials or replicates to eliminate noise and enhance the underlying signal of interest. These studies recording brain signals non-invasively using functional neuroimaging techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) generate complex, high dimensional and noisy data for many subjects at a number of replicates. Single replicate (or single trial) analysis of neuroimaging data have gained focus as they are advantageous to study the features of the signals at each replicate without averaging out important features in the data that the current methods employ. The research here is conducted to systematically develop flexible regression mixed models for single trial analysis of specific brain activities using examples from EEG and MEG to illustrate the models. This thesis follows three specific themes: i) artefact correction to estimate the `brain' signal which is of interest, ii) characterisation of the signals to reduce their dimensions, and iii) model fitting for single trials after accounting for variations between subjects and within subjects (between replicates). The models are developed to establish evidence of two specific neurological phenomena - entrainment of brain signals to an $\alpha$ band of frequencies (8-12Hz) and dipolar brain activation in the same $\alpha$ frequency band in an EEG experiment and a MEG study, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Autoimmune hepatitis (AIH) is a disease of unknown aetiology with drug-induced AIH being the most complex and not fully understood type. We present the case of a 57-year-old female patient with acute icteric hepatitis after interferon-beta-1b (IFNβ-1b) administration for multiple sclerosis (MS). Based on liver autoimmune serology, histology and appropriate exclusion of other liver diseases, a diagnosis of AIH-related cirrhosis was established. Following discontinuation of IFNβ-1b, a complete resolution of biochemical activity indices was observed and the patient remained untreated on her own decision. However, 3 years later, after a course of intravenous methylprednisolone for MS, a new acute transaminase flare was recorded which subsided again spontaneously after 3 weeks. Liver biopsy and elastography showed significant fibrosis regression (F2 fibrosis). To our knowledge, this is the first report showing spontaneous cirrhosis regression in an IFNβ-1b-induced AIH-like syndrome following drug withdrawal, suggesting that cirrhosis might be reversible if the offending fibrogenic stimulus is withdrawn.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classical regression analysis can be used to model time series. However, the assumption that model parameters are constant over time is not necessarily adapted to the data. In phytoplankton ecology, the relevance of time-varying parameter values has been shown using a dynamic linear regression model (DLRM). DLRMs, belonging to the class of Bayesian dynamic models, assume the existence of a non-observable time series of model parameters, which are estimated on-line, i.e. after each observation. The aim of this paper was to show how DLRM results could be used to explain variation of a time series of phytoplankton abundance. We applied DLRM to daily concentrations of Dinophysis cf. acuminata, determined in Antifer harbour (French coast of the English Channel), along with physical and chemical covariates (e.g. wind velocity, nutrient concentrations). A single model was built using 1989 and 1990 data, and then applied separately to each year. Equivalent static regression models were investigated for the purpose of comparison. Results showed that most of the Dinophysis cf. acuminata concentration variability was explained by the configuration of the sampling site, the wind regime and tide residual flow. Moreover, the relationships of these factors with the concentration of the microalga varied with time, a fact that could not be detected with static regression. Application of dynamic models to phytoplankton time series, especially in a monitoring context, is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mass spectrometry (MS)-based proteomics has seen significant technical advances during the past two decades and mass spectrometry has become a central tool in many biosciences. Despite the popularity of MS-based methods, the handling of the systematic non-biological variation in the data remains a common problem. This biasing variation can result from several sources ranging from sample handling to differences caused by the instrumentation. Normalization is the procedure which aims to account for this biasing variation and make samples comparable. Many normalization methods commonly used in proteomics have been adapted from the DNA-microarray world. Studies comparing normalization methods with proteomics data sets using some variability measures exist. However, a more thorough comparison looking at the quantitative and qualitative differences of the performance of the different normalization methods and at their ability in preserving the true differential expression signal of proteins, is lacking. In this thesis, several popular and widely used normalization methods (the Linear regression normalization, Local regression normalization, Variance stabilizing normalization, Quantile-normalization, Median central tendency normalization and also variants of some of the forementioned methods), representing different strategies in normalization are being compared and evaluated with a benchmark spike-in proteomics data set. The normalization methods are evaluated in several ways. The performance of the normalization methods is evaluated qualitatively and quantitatively on a global scale and in pairwise comparisons of sample groups. In addition, it is investigated, whether performing the normalization globally on the whole data or pairwise for the comparison pairs examined, affects the performance of the normalization method in normalizing the data and preserving the true differential expression signal. In this thesis, both major and minor differences in the performance of the different normalization methods were found. Also, the way in which the normalization was performed (global normalization of the whole data or pairwise normalization of the comparison pair) affected the performance of some of the methods in pairwise comparisons. Differences among variants of the same methods were also observed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present work proposes a Hypothesis Test to detect a shift in the variance of a series of independent normal observations using a statistic based on the p-values of the F distribution. Since the probability distribution function of this statistic is intractable, critical values were we estimated numerically through extensive simulation. A regression approach was used to simplify the quantile evaluation and extrapolation. The power of the test was simulated using Monte Carlo simulation, and the results were compared with the Chen test (1997) to prove its efficiency. Time series analysts might find the test useful to address homoscedasticity studies were at most one change might be involved.