42 resultados para Prediction error method
Resumo:
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
Patient outcomes in transplantation would improve if dosing of immunosuppressive agents was individualized. The aim of this study is to develop a population pharmacokinetic model of tacrolimus in adult liver transplant recipients and test this model in individualizing therapy. Population analysis was performed on data from 68 patients. Estimates were sought for apparent clearance (CL/F) and apparent volume of distribution (V/F) using the nonlinear mixed effects model program (NONMEM). Factors screened for influence on these parameters were weight, age, sex, transplant type, biliary reconstructive procedure, postoperative day, days of therapy, liver function test results, creatinine clearance, hematocrit, corticosteroid dose, and interacting drugs. The predictive performance of the developed model was evaluated through Bayesian forecasting in an independent cohort of 36 patients. No linear correlation existed between tacrolimus dosage and trough concentration (r(2) = 0.005). Mean individual Bayesian estimates for CL/F and V/F were 26.5 8.2 (SD) L/hr and 399 +/- 185 L, respectively. CL/F was greater in patients with normal liver function. V/F increased with patient weight. CL/F decreased with increasing hematocrit. Based on the derived model, a 70-kg patient with an aspartate aminotransferase (AST) level less than 70 U/L would require a tacrolimus dose of 4.7 mg twice daily to achieve a steady-state trough concentration of 10 ng/mL. A 50-kg patient with an AST level greater than 70 U/L would require a dose of 2.6 mg. Marked interindividual variability (43% to 93%) and residual random error (3.3 ng/mL) were observed. Predictions made using the final model were reasonably nonbiased (0.56 ng/mL), but imprecise (4.8 ng/mL). Pharmacokinetic information obtained will assist in tacrolimus dosing; however, further investigation into reasons for the pharmacokinetic variability of tacrolimus is required.
Resumo:
The large number of protein kinases makes it impractical to determine their specificities and substrates experimentally. Using the available crystal structures, molecular modeling, and sequence analyses of kinases and substrates, we developed a set of rules governing the binding of a heptapeptide substrate motif (surrounding the phosphorylation site) to the kinase and implemented these rules in a web-interfaced program for automated prediction of optimal substrate peptides, taking only the amino acid sequence of a protein kinase as input. We show the utility of the method by analyzing yeast cell cycle control and DNA damage checkpoint pathways. Our method is the only available predictive method generally applicable for identifying possible substrate proteins for protein serine/threonine kinases and helps in silico construction of signaling pathways. The accuracy of prediction is comparable to the accuracy of data from systematic large-scale experimental approaches.
Resumo:
Accurate estimates of body mass in fossil taxa are fundamental to paleobiological reconstruction. Predictive equations derived from correlation with craniodental and body mass data in extant taxa are the most commonly used, but they can be unreliable for species whose morphology departs widely from that of living relatives. Estimates based on proximal limb-bone circumference data are more accurate but are inapplicable where postcranial remains are unknown. In this study we assess the efficacy of predicting body mass in Australian fossil marsupials by using an alternative correlate, endocranial volume. Body mass estimates for a species with highly unusual craniodental anatomy, the Pleistocene marsupial lion (Thylacoleo carnifex), fall within the range determined on the basis of proximal limb-bone circumference data, whereas estimates based on dental data are highly dubious. For all marsupial taxa considered, allometric relationships have small confidence intervals, and percent prediction errors are comparable to those of the best predictors using craniodental data. Although application is limited in some respects, this method may provide a useful means of estimating body mass for species with atypical craniodental or postcranial morphologies and taxa unrepresented by postcranial remains. A trend toward increased encephalization may constrain the method's predictive power with respect to many, but not all, placental clades.
Resumo:
The use of presence/absence data in wildlife management and biological surveys is widespread. There is a growing interest in quantifying the sources of error associated with these data. We show that false-negative errors (failure to record a species when in fact it is present) can have a significant impact on statistical estimation of habitat models using simulated data. Then we introduce an extension of logistic modeling, the zero-inflated binomial (ZIB) model that permits the estimation of the rate of false-negative errors and the correction of estimates of the probability of occurrence for false-negative errors by using repeated. visits to the same site. Our simulations show that even relatively low rates of false negatives bias statistical estimates of habitat effects. The method with three repeated visits eliminates the bias, but estimates are relatively imprecise. Six repeated visits improve precision of estimates to levels comparable to that achieved with conventional statistics in the absence of false-negative errors In general, when error rates are less than or equal to50% greater efficiency is gained by adding more sites, whereas when error rates are >50% it is better to increase the number of repeated visits. We highlight the flexibility of the method with three case studies, clearly demonstrating the effect of false-negative errors for a range of commonly used survey methods.
Resumo:
Participation in at least 30 min of moderate intensity activity on most days is assumed to confer health benefits. This study accordingly determined whether the more vigorous household and garden tasks (sweeping, window cleaning, vacuuming and lawn mowing) are performed by middle-aged men at a moderate intensity of 3-6 metabolic equivalents (METs) in the laboratory and at home. Measured energy expenditure during self-perceived moderate-paced walking was used as a marker of exercise intensity. Energy expenditure was also predicted via indirect methods. Thirty-six males [Xmacr (SD): 40.0 (3.3) years; 179.5 (6.9) cm; 83.4 (14.0) kg] were measured for resting metabolic rate (RMR) and oxygen consumption (V.O-2) during the five activities using the Douglas bag method. Heart rate , respiratory frequency, CSA (Computer Science Applications) movement counts, Borg scale ratings of perceived exertion and Quetelet's index were also recorded as potential predictors of exercise intensity. Except for vacuuming in the laboratory, which was not significantly different from 3.0 METs (P=0.98), the MET means in the laboratory and home were all significantly greater than 3.0 (Pless than or equal to0.006). The sweeping and vacuuming MET means were significantly higher (P
Resumo:
Background: Reliability or validity studies are important for the evaluation of measurement error in dietary assessment methods. An approach to validation known as the method of triads uses triangulation techniques to calculate the validity coefficient of a food-frequency questionnaire (FFQ). Objective: To assess the validity of an FFQ estimates of carotenoid and vitamin E intake against serum biomarker measurements and weighed food records (WFRs), by applying the method of triads. Design: The study population was a sub-sample of adult participants in a randomised controlled trial of beta-carotene and sunscreen in the prevention of skin cancer. Dietary intake was assessed by a self-administered FFQ and a WFR. Nonfasting blood samples were collected and plasma analysed for five carotenoids (alpha-carotene, beta-carotene, beta-cryptoxanthin, lutein, lycopene) and vitamin E. Correlation coefficients were calculated between each of the dietary methods and the validity coefficient was calculated using the method of triads. The 95% confidence intervals for the validity coefficients were estimated using bootstrap sampling. Results: The validity coefficients of the FFQ were highest for alpha-carotene (0.85) and lycopene (0.62), followed by beta- carotene (0.55) and total carotenoids (0.55), while the lowest validity coefficient was for lutein (0.19). The method of triads could not be used for b- cryptoxanthin and vitamin E, as one of the three underlying correlations was negative. Conclusions: Results were similar to other studies of validity using biomarkers and the method of triads. For many dietary factors, the upper limit of the validity coefficients was less than 0.5 and therefore only strong relationships between dietary exposure and disease will be detected.
Resumo:
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two windows of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. (C) 2004 Wiley-Liss, Inc.
Resumo:
Superplastic bulging is the most successful application of superplastic forming (SPF) in industry, but the non-uniform wall thickness distribution of parts formed by it is a common technical problem yet to be overcome. Based on a rigid-viscoplastic finite element program developed by the authors, for simulation of the sheet superplastic forming process combined with the prediction of microstructure variations (such as grain growth and cavity growth), a simple and efficient preform design method is proposed and applied to the design of preform mould for manufacturing parts with uniform wall thickness. Examples of formed parts are presented here to demonstrate that the technology can be used to improve the uniformity of wall thickness to meet practical requirements. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we assess the relative performance of the direct valuation method and industry multiplier models using 41 435 firm-quarter Value Line observations over an 11 year (1990–2000) period. Results from both pricingerror and return-prediction analyses indicate that direct valuation yields lower percentage pricing errors and greater return prediction ability than the forward price to aggregated forecasted earnings multiplier model. However, a simple hybrid combination of these two methods leads to more accurate intrinsic value estimates, compared to either method used in isolation. It would appear that fundamental analysis could benefit from using one approach as a check on the other.
Resumo:
Background: In paediatric clinical practice treatment is often adjusted in relation to body size, for example the calculation of pharmacological and dialysis dosages. In addition to use of body weight, for some purposes total body water (TBW) and surface area are estimated from anthropometry using equations developed several decades previously. Whether such equations remain valid in contemporary populations is not known. Methods: Total body water was measured using deuterium dilution in 672 subjects (265 infants aged < 1 year; 407 children and adolescents aged 1-19 years) during the period 1990-2003. TBW was predicted (a) using published equations, and (b) directly from data on age, sex, weight, and height. Results: Previously published equations, based on data obtained before 1970, significantly overestimated TBW, with average biases ranging from 4% to 11%. For all equations, the overestimation of TBW was greatest in infancy. New equations were generated. The best equation, incorporating log weight, log height, age, and sex, had a standard error of the estimate of 7.8%. Conclusions: Secular trends in the nutritional status of infants and children are altering the relation between age or weight and TBW. Equations developed in previous decades significantly overestimate TBW in all age groups, especially infancy; however, the relation between TBW and weight may continue to change. This scenario is predicted to apply more generally to many aspects of paediatric clinical practice in which dosages are calculated on the basis of anthropometric data collected in previous decades.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
The polypeptide backbones and side chains of proteins are constantly moving due to thermal motion and the kinetic energy of the atoms. The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Computational approaches to predict thermal motion are useful for analyzing the dynamic properties of proteins with unknown structures. In this article, we utilize a novel support vector regression (SVR) approach to predict the B-factor distribution (B-factor profile) of a protein from its sequence. We explore schemes for encoding sequences and various settings for the parameters used in SVR. Based on a large dataset of high-resolution proteins, our method predicts the B-factor distribution with a Pearson correlation coefficient (CC) of 0.53. In addition, our method predicts the B-factor profile with a CC of at least 0.56 for more than half of the proteins. Our method also performs well for classifying residues (rigid vs. flexible). For almost all predicted B-factor thresholds, prediction accuracies (percent of correctly predicted residues) are greater than 70%. These results exceed the best results of other sequence-based prediction methods. (C) 2005 Wiley-Liss, Inc.
Resumo:
Background: Lean bodyweight (LBW) has been recommended for scaling drug doses. However, the current methods for predicting LBW are inconsistent at extremes of size and could be misleading with respect to interpreting weight-based regimens. Objective: The objective of the present study was to develop a semi-mechanistic model to predict fat-free mass (FFM) from subject characteristics in a population that includes extremes of size. FFM is considered to closely approximate LBW. There are several reference methods for assessing FFM, whereas there are no reference standards for LBW. Patients and methods: A total of 373 patients (168 male, 205 female) were included in the study. These data arose from two populations. Population A (index dataset) contained anthropometric characteristics, FFM estimated by dual-energy x-ray absorptiometry (DXA - a reference method) and bioelectrical impedance analysis (BIA) data. Population B (test dataset) contained the same anthropometric measures and FFM data as population A, but excluded BIA data. The patients in population A had a wide range of age (18-82 years), bodyweight (40.7-216.5kg) and BMI values (17.1-69.9 kg/m(2)). Patients in population B had BMI values of 18.7-38.4 kg/m(2). A two-stage semi-mechanistic model to predict FFM was developed from the demographics from population A. For stage 1 a model was developed to predict impedance and for stage 2 a model that incorporated predicted impedance was used to predict FFM. These two models were combined to provide an overall model to predict FFM from patient characteristics. The developed model for FFM was externally evaluated by predicting into population B. Results: The semi-mechanistic model to predict impedance incorporated sex, height and bodyweight. The developed model provides a good predictor of impedance for both males and females (r(2) = 0.78, mean error [ME] = 2.30 x 10(-3), root mean square error [RMSE] = 51.56 [approximately 10% of mean]). The final model for FFM incorporated sex, height and bodyweight. The developed model for FFM provided good predictive performance for both males and females (r(2) = 0.93, ME = -0.77, RMSE = 3.33 [approximately 6% of mean]). In addition, the model accurately predicted the FFM of subjects in population B (r(2) = 0.85, ME -0.04, RMSE = 4.39 [approximately 7% of mean]). Conclusions: A semi-mechanistic model has been developed to predict FFM (and therefore LBW) from easily accessible patient characteristics. This model has been prospectively evaluated and shown to have good predictive performance.
Resumo:
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules ( proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability ( area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets termed T-cell epitope hotspots. MULTIPRED is available at http:// antigen.i2r.a-star.edu.sg/ multipred/.