985 resultados para Mathematical prediction.
Resumo:
Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Resumo:
Previous studies have enabled exact prediction of probabilities of identity-by-descent (IBD) in randommating populations for a few loci (up to four or so), with extension to more using approximate regression methods. Here we present a precise predictor of multiple-locus IBD using simple formulas based on exact results for two loci. In particular, the probability of non-IBD X ABC at each of ordered loci A, B, and C can be well approximated by XABC = XABXBC/XB and generalizes to X123. . .k = X12X23. . .Xk-1,k/ Xk-2, where X is the probability of non-IBD at each locus. Predictions from this chain rule are very precise with population bottlenecks and migration, but are rather poorer in the presence of mutation. From these coefficients, the probabilities of multilocus IBD and non-IBD can also be computed for genomic regions as functions of population size, time, and map distances. An approximate but simple recurrence formula is also developed, which generally is less accurate than the chain rule but is more robust with mutation. Used together with the chain rule it leads to explicit equations for non-IBD in a region. The results can be applied to detection of quantitative trait loci (QTL) by computing the probability of IBD at candidate loci in terms of identity-by-state at neighboring markers.
Resumo:
Objective: Menopause is the consequence of exhaustion of the ovarian follicular pool. AMH, an indirect hormonal marker of ovarian reserve, has been recently proposed as a predictor for age at menopause. Since BMI and smoking status are relevant independent factors associated with age at menopause we evaluated whether a model including all three of these variables could improve AMH-based prediction of age at menopause. Methods: In the present cohort study, participants were 375 eumenorrheic women aged 19–44 years and a sample of 2,635 Italian menopausal women. AMH values were obtained from the eumenorrheic women. Results: Regression analysis of the AMH data showed that a quadratic function of age provided a good description of these data plotted on a logarithmic scale, with a distribution of residual deviates that was not normal but showed significant leftskewness. Under the hypothesis that menopause can be predicted by AMH dropping below a critical threshold, a model predicting menopausal age was constructed from the AMH regression model and applied to the data on menopause. With the AMH threshold dependent on the covariates BMI and smoking status, the effects of these covariates were shown to be highly significant. Conclusions: In the present study we confirmed the good level of conformity between the distributions of observed and AMH-predicted ages at menopause, and showed that using BMI and smoking status as additional variables improves AMH-based prediction of age at menopause.
Resumo:
Nitrous oxide (N2O) is one of the greenhouse gases that can contribute to global warming. Spatial variability of N2O can lead to large uncertainties in prediction. However, previous studies have often ignored the spatial dependency to quantify the N2O - environmental factors relationships. Few researches have examined the impacts of various spatial correlation structures (e.g. independence, distance-based and neighbourhood based) on spatial prediction of N2O emissions. This study aimed to assess the impact of three spatial correlation structures on spatial predictions and calibrate the spatial prediction using Bayesian model averaging (BMA) based on replicated, irregular point-referenced data. The data were measured in 17 chambers randomly placed across a 271 m(2) field between October 2007 and September 2008 in the southeast of Australia. We used a Bayesian geostatistical model and a Bayesian spatial conditional autoregressive (CAR) model to investigate and accommodate spatial dependency, and to estimate the effects of environmental variables on N2O emissions across the study site. We compared these with a Bayesian regression model with independent errors. The three approaches resulted in different derived maps of spatial prediction of N2O emissions. We found that incorporating spatial dependency in the model not only substantially improved predictions of N2O emission from soil, but also better quantified uncertainties of soil parameters in the study. The hybrid model structure obtained by BMA improved the accuracy of spatial prediction of N2O emissions across this study region.
Resumo:
The study investigated the influence of traffic and land use parameters on metal build-up on urban road surfaces. Mathematical relationships were developed to predict metals originating from fuel combustion and vehicle wear. The analysis undertaken found that nickel and chromium originate from exhaust emissions, lead, copper and zinc from vehicle wear, cadmium from both exhaust and wear and manganese from geogenic sources. Land use does not demonstrate a clear pattern in relation to the metal build-up process, though its inherent characteristics such as traffic activities exert influence. The equation derived for fuel related metal load has high cross-validated coefficient of determination (Q2) and low Standard Error of Cross-Validation (SECV) values indicates that the model is reliable, while the equation derived for wear-related metal load has low Q2 and high SECV values suggesting its use only in preliminary investigations. Relative Prediction Error values for both equations are considered to be well within the error limits for a complex system such as an urban road surface. These equations will be beneficial for developing reliable stormwater treatment strategies in urban areas which specifically focus on mitigation of metal pollution.
Resumo:
Accurate prediction of incident duration is not only important information of Traffic Incident Management System, but also an ffective input for travel time prediction. In this paper, the hazard based prediction odels are developed for both incident clearance time and arrival time. The data are obtained from the Queensland Department of Transport and Main Roads’ STREAMS Incident Management System (SIMS) for one year ending in November 2010. The best fitting distributions are drawn for both clearance and arrival time for 3 types of incident: crash, stationary vehicle, and hazard. The results show that Gamma, Log-logistic, and Weibull are the best fit for crash, stationary vehicle, and hazard incident, respectively. The obvious impact factors are given for crash clearance time and arrival time. The quantitative influences for crash and hazard incident are presented for both clearance and arrival. The model accuracy is analyzed at the end.
Resumo:
This paper evaluates the performances of prediction intervals generated from alternative time series models, in the context of tourism forecasting. The forecasting methods considered include the autoregressive (AR) model, the AR model using the bias-corrected bootstrap, seasonal ARIMA models, innovations state space models for exponential smoothing, and Harvey’s structural time series models. We use thirteen monthly time series for the number of tourist arrivals to Hong Kong and Australia. The mean coverage rates and widths of the alternative prediction intervals are evaluated in an empirical setting. It is found that all models produce satisfactory prediction intervals, except for the autoregressive model. In particular, those based on the biascorrected bootstrap perform best in general, providing tight intervals with accurate coverage rates, especially when the forecast horizon is long.
Resumo:
Protein adsorption at solid-liquid interfaces is critical to many applications, including biomaterials, protein microarrays and lab-on-a-chip devices. Despite this general interest, and a large amount of research in the last half a century, protein adsorption cannot be predicted with an engineering level, design-orientated accuracy. Here we describe a Biomolecular Adsorption Database (BAD), freely available online, which archives the published protein adsorption data. Piecewise linear regression with breakpoint applied to the data in the BAD suggests that the input variables to protein adsorption, i.e., protein concentration in solution; protein descriptors derived from primary structure (number of residues, global protein hydrophobicity and range of amino acid hydrophobicity, isoelectric point); surface descriptors (contact angle); and fluid environment descriptors (pH, ionic strength), correlate well with the output variable-the protein concentration on the surface. Furthermore, neural network analysis revealed that the size of the BAD makes it sufficiently representative, with a neural network-based predictive error of 5% or less. Interestingly, a consistently better fit is obtained if the BAD is divided in two separate sub-sets representing protein adsorption on hydrophilic and hydrophobic surfaces, respectively. Based on these findings, selected entries from the BAD have been used to construct neural network-based estimation routines, which predict the amount of adsorbed protein, the thickness of the adsorbed layer and the surface tension of the protein-covered surface. While the BAD is of general interest, the prediction of the thickness and the surface tension of the protein-covered layers are of particular relevance to the design of microfluidics devices.
Resumo:
Most standard algorithms for prediction with expert advice depend on a parameter called the learning rate. This learning rate needs to be large enough to fit the data well, but small enough to prevent overfitting. For the exponential weights algorithm, a sequence of prior work has established theoretical guarantees for higher and higher data-dependent tunings of the learning rate, which allow for increasingly aggressive learning. But in practice such theoretical tunings often still perform worse (as measured by their regret) than ad hoc tuning with an even higher learning rate. To close the gap between theory and practice we introduce an approach to learn the learning rate. Up to a factor that is at most (poly)logarithmic in the number of experts and the inverse of the learning rate, our method performs as well as if we would know the empirically best learning rate from a large range that includes both conservative small values and values that are much higher than those for which formal guarantees were previously available. Our method employs a grid of learning rates, yet runs in linear time regardless of the size of the grid.
Resumo:
The ability to estimate the expected Remaining Useful Life (RUL) is critical to reduce maintenance costs, operational downtime and safety hazards. In most industries, reliability analysis is based on the Reliability Centred Maintenance (RCM) and lifetime distribution models. In these models, the lifetime of an asset is estimated using failure time data; however, statistically sufficient failure time data are often difficult to attain in practice due to the fixed time-based replacement and the small population of identical assets. When condition indicator data are available in addition to failure time data, one of the alternate approaches to the traditional reliability models is the Condition-Based Maintenance (CBM). The covariate-based hazard modelling is one of CBM approaches. There are a number of covariate-based hazard models; however, little study has been conducted to evaluate the performance of these models in asset life prediction using various condition indicators and data availability. This paper reviews two covariate-based hazard models, Proportional Hazard Model (PHM) and Proportional Covariate Model (PCM). To assess these models’ performance, the expected RUL is compared to the actual RUL. Outcomes demonstrate that both models achieve convincingly good results in RUL prediction; however, PCM has smaller absolute prediction error. In addition, PHM shows over-smoothing tendency compared to PCM in sudden changes of condition data. Moreover, the case studies show PCM is not being biased in the case of small sample size.
Resumo:
If the land sector is to make significant contributions to mitigating anthropogenic greenhouse gas (GHG) emissions in coming decades, it must do so while concurrently expanding production of food and fiber. In our view, mathematical modeling will be required to provide scientific guidance to meet this challenge. In order to be useful in GHG mitigation policy measures, models must simultaneously meet scientific, software engineering, and human capacity requirements. They can be used to understand GHG fluxes, to evaluate proposed GHG mitigation actions, and to predict and monitor the effects of specific actions; the latter applications require a change in mindset that has parallels with the shift from research modeling to decision support. We compare and contrast 6 agro-ecosystem models (FullCAM, DayCent, DNDC, APSIM, WNMM, and AgMod), chosen because they are used in Australian agriculture and forestry. Underlying structural similarities in the representations of carbon flows though plants and soils in these models are complemented by a diverse range of emphases and approaches to the subprocesses within the agro-ecosystem. None of these agro-ecosystem models handles all land sector GHG fluxes, and considerable model-based uncertainty exists for soil C fluxes and enteric methane emissions. The models also show diverse approaches to the initialisation of model simulations, software implementation, distribution, licensing, and software quality assurance; each of these will differentially affect their usefulness for policy-driven GHG mitigation prediction and monitoring. Specific requirements imposed on the use of models by Australian mitigation policy settings are discussed, and areas for further scientific development of agro-ecosystem models for use in GHG mitigation policy are proposed.