23 resultados para Quantile regression
em University of Queensland eSpace - Australia
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
In this article we investigate the asymptotic and finite-sample properties of predictors of regression models with autocorrelated errors. We prove new theorems associated with the predictive efficiency of generalized least squares (GLS) and incorrectly structured GLS predictors. We also establish the form associated with their predictive mean squared errors as well as the magnitude of these errors relative to each other and to those generated from the ordinary least squares (OLS) predictor. A large simulation study is used to evaluate the finite-sample performance of forecasts generated from models using different corrections for the serial correlation.
Resumo:
Background and Objective: To examine if commonly recommended assumptions for multivariable logistic regression are addressed in two major epidemiological journals. Methods: Ninety-nine articles from the Journal of Clinical Epidemiology and the American Journal of Epidemiology were surveyed for 10 criteria: six dealing with computation and four with reporting multivariable logistic regression results. Results: Three of the 10 criteria were addressed in 50% or more of the articles. Statistical significance testing or confidence intervals were reported in all articles. Methods for selecting independent variables were described in 82%, and specific procedures used to generate the models were discussed in 65%. Fewer than 50% of the articles indicated if interactions were tested or met the recommended events per independent variable ratio of 10: 1. Fewer than 20% of the articles described conformity to a linear gradient, examined collinearity, reported information on validation procedures, goodness-of-fit, discrimination statistics, or provided complete information on variable coding. There was no significant difference (P >.05) in the proportion of articles meeting the criteria across the two journals. Conclusion: Articles reviewed frequently did not report commonly recommended assumptions for using multivariable logistic regression. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Studies have shown that increased arterial stiffening can be an indication of cardiovascular diseases like hypertension. In clinical practice, this can be detected by measuring the blood pressure (BP) using a sphygmomanometer but it cannot be used for prolonged monitoring. It has been established that pulse wave velocity (PWV) is a direct measure of arterial stiffening but its usefulness is hampered by the absence of non-invasive techniques to estimate it. Pulse transit time (PTT) is a simple and non-invasive method derived from PWV. However, limited knowledge of PTT in children is found in the present literature. The aims of this study are to identify independent variables that confound PTT measure and describe PTT regression equations for healthy children. Therefore, PTT reference values are formulated for future pathological studies. Fifty-five Caucasian children (39 male) aged 8.4 +/- 2.3 yr (range 5-12 yr) were recruited. Predictive equations for PTT were obtained by multiple regressions with age, vascular path length, BP indexes and heart rate. These derived equations were compared in their PWV equivalent against two previously reported equations and significant agreement was obtained (p < 0.05). Findings herein also suggested that PTT can be useful as a continuous surrogate BP monitor in children.
Resumo:
Background Regression to the mean (RTM) is a statistical phenomenon that can make natural variation in repeated data look like real change. It happens when unusually large or small measurements tend to be followed by measurements that are closer to the mean. Methods We give some examples of the phenomenon, and discuss methods to overcome it at the design and analysis stages of a study. Results The effect of RTM in a sample becomes more noticeable with increasing measurement error and when follow-up measurements are only examined on a sub-sample selected using a baseline value. Conclusions RTM is a ubiquitous phenomenon in repeated data and should always be considered as a possible cause of an observed change. Its effect can be alleviated through better study design and use of suitable statistical methods.
Resumo:
Long-term forecasts of pest pressure are central to the effective management of many agricultural insect pests. In the eastern cropping regions of Australia, serious infestations of Helicoverpa punctigera (Wallengren) and H. armigera (Hübner)(Lepidoptera: Noctuidae) are experienced annually. Regression analyses of a long series of light-trap catches of adult moths were used to describe the seasonal dynamics of both species. The size of the spring generation in eastern cropping zones could be related to rainfall in putative source areas in inland Australia. Subsequent generations could be related to the abundance of various crops in agricultural areas, rainfall and the magnitude of the spring population peak. As rainfall figured prominently as a predictor variable, and can itself be predicted using the Southern Oscillation Index (SOI), trap catches were also related to this variable. The geographic distribution of each species was modelled in relation to climate and CLIMEX was used to predict temporal variation in abundance at given putative source sites in inland Australia using historical meteorological data. These predictions were then correlated with subsequent pest abundance data in a major cropping region. The regression-based and bioclimatic-based approaches to predicting pest abundance are compared and their utility in predicting and interpreting pest dynamics are discussed.
Resumo:
Cholesterol is a major component of atherosclerotic plaques. Cholesterol accumulation within the arterial intima and atherosclerotic plaques is determined by the difference of cellular cholesterol synthesis and/or influx from apo B-containing lipoproteins and cholesterol efflux. In humans, apo A-I Milano infusion has led to rapid regression of atherosclerosis in coronary arteries. We hypothesised that a multifunctional plasma delipidation process (PDP) would lead to rapid regression of experimental atherosclerosis and probably impact on adipose tissue lipids. In hyperlipidemic animals, the plasma concentrations of cholesterol, triglyceride and phospholipid were, respectively, 6-, 157-, and 18-fold higher than control animals, which consequently resulted in atherosclerosis. PDP consisted of delipidation of plasma with a mixture of butanol-diisopropyl ether (DIPE). PDP removed considerably more lipid from the hyperlipidemic animals than in normolipidemic animals. PDP treatment of hyperlipidemic animals markedly reduced intensity of lipid staining materials in the arterial wall and led to dramatic reduction of lipid in the adipose tissue. Five PDP treatments increased apolipoprotein A1 concentrations in all animals. Biochemical and hematological parameters were unaffected during PDP treatment. These results show that five PDP treatments led to marked reduction in avian atherosclerosis and removal of lipid from adipose tissue. PDP is a highly effective method for rapid regression of atherosclerosis.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Pharmacodynamics (PD) is the study of the biochemical and physiological effects of drugs. The construction of optimal designs for dose-ranging trials with multiple periods is considered in this paper, where the outcome of the trial (the effect of the drug) is considered to be a binary response: the success or failure of a drug to bring about a particular change in the subject after a given amount of time. The carryover effect of each dose from one period to the next is assumed to be proportional to the direct effect. It is shown for a logistic regression model that the efficiency of optimal parallel (single-period) or crossover (two-period) design is substantially greater than a balanced design. The optimal designs are also shown to be robust to misspecification of the value of the parameters. Finally, the parallel and crossover designs are combined to provide the experimenter with greater flexibility.
Resumo:
Promoted-ignition testing on carbon steel rods of varying cross-sectional area and shape was performed in high pressure oxygen to assess the effect of sample geometry on the regression rate of the melting interface. Cylindrical and rectangular geometries and three different cross sections were tested and the regression rates of the cylinders were compared to the regression rates of the rectangular samples at test pressures around 6.9 MPa. Tests were recorded and video analysis used to determine the regression rate of the melting interface by a new method based on a drop cycle which was found to provide a good basis for statistical analysis and provide excellent agreement to the standard averaging methods used. Both geometries tested showed the typical trend of decreasing regression rate of the melting interface with increasing cross-sectional area; however, it was shown that the effect of geometry is more significant as the sample's cross sections become larger. Discussion is provided regarding the use of 3.2-mm square rods rather than 3.2-mm cylindrical rods within the standard ASTM test and any effect this may have on the observed regression rate of the melting interface.