884 resultados para Prediction error method
Resumo:
A modified UNIQUAC model has been extended to describe and predict the equilibrium relative humidity and moisture content for wood. The method is validated over a range of moisture content from oven-dried state to fiber saturation point, and over a temperature range of 20-70 degrees C. Adjustable parameters and binary interaction parameters of the UNIQUAC model were estimated from experimental data for Caribbean pine and Hoop pine as well as data available in the literature. The two group-interaction parameters for the wood-moisture system were consistent with using function group contributions for H2O, -OH and -CHO. The result reconfirms that the main contributors to water adsorption in cell walls are the hydroxyl groups of the carbohydrates in cellulose and hemicelluloses. This provides some physical insight into the intermolecular force and energy between bound water and the wood material. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
We propose a novel interpretation and usage of Neural Network (NN) in modeling physiological signals, which are allowed to be nonlinear and/or nonstationary. The method consists of training a NN for the k-step prediction of a physiological signal, and then examining the connection-weight-space (CWS) of the NN to extract information about the signal generator mechanism. We de. ne a novel feature, Normalized Vector Separation (gamma(ij)), to measure the separation of two arbitrary states i and j in the CWS and use it to track the state changes of the generating system. The performance of the method is examined via synthetic signals and clinical EEG. Synthetic data indicates that gamma(ij) can track the system down to a SNR of 3.5 dB. Clinical data obtained from three patients undergoing carotid endarterectomy of the brain showed that EEG could be modeled (within a root-means-squared-error of 0.01) by the proposed method, and the blood perfusion state of the brain could be monitored via gamma(ij), with small NNs having no more than 21 connection weight altogether.
Resumo:
Univariate linkage analysis is used routinely to localise genes for human complex traits. Often, many traits are analysed but the significance of linkage for each trait is not corrected for multiple trait testing, which increases the experiment-wise type-I error rate. In addition, univariate analyses do not realise the full power provided by multivariate data sets. Multivariate linkage is the ideal solution but it is computationally intensive, so genome-wide analysis and evaluation of empirical significance are often prohibitive. We describe two simple methods that efficiently alleviate these caveats by combining P-values from multiple univariate linkage analyses. The first method estimates empirical pointwise and genome-wide significance between one trait and one marker when multiple traits have been tested. It is as robust as an appropriate Bonferroni adjustment, with the advantage that no assumptions are required about the number of independent tests performed. The second method estimates the significance of linkage between multiple traits and one marker and, therefore, it can be used to localise regions that harbour pleiotropic quantitative trait loci (QTL). We show that this method has greater power than individual univariate analyses to detect a pleiotropic QTL across different situations. In addition, when traits are moderately correlated and the QTL influences all traits, it can outperform formal multivariate VC analysis. This approach is computationally feasible for any number of traits and was not affected by the residual correlation between traits. We illustrate the utility of our approach with a genome scan of three asthma traits measured in families with a twin proband.
Resumo:
Background: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. Results: In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations (nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, and lysosome). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. Conclusion: No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
We present a novel method for prediction of the onset of a spontaneous (paroxysmal) atrial fibrilation episode by representing the electrocardiograph (ECG) output as two time series corresponding to the interbeat intervals and the lengths of the atrial component of the ECG. We will then show how different entropy measures can be calulated from both of these series and then combined in a neural network trained using the Bayesian evidence procedure to form and effective predictive classifier.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Resumo:
We investigate the performance of parity check codes using the mapping onto spin glasses proposed by Sourlas. We study codes where each parity check comprises products of K bits selected from the original digital message with exactly C parity checks per message bit. We show, using the replica method, that these codes saturate Shannon's coding bound for K?8 when the code rate K/C is finite. We then examine the finite temperature case to asses the use of simulated annealing methods for decoding, study the performance of the finite K case and extend the analysis to accommodate different types of noisy channels. The analogy between statistical physics methods and decoding by belief propagation is also discussed.
Resumo:
The performance of Gallager's error-correcting code is investigated via methods of statistical physics. In this method, the transmitted codeword comprises products of the original message bits selected by two randomly-constructed sparse matrices; the number of non-zero row/column elements in these matrices constitutes a family of codes. We show that Shannon's channel capacity is saturated for many of the codes while slightly lower performance is obtained for others which may be of higher practical relevance. Decoding aspects are considered by employing the TAP approach which is identical to the commonly used belief-propagation-based decoding.
Resumo:
An exact solution to a family of parity check error-correcting codes is provided by mapping the problem onto a Husimi cactus. The solution obtained in the thermodynamic limit recovers the replica-symmetric theory results and provides a very good approximation to finite systems of moderate size. The probability propagation decoding algorithm emerges naturally from the analysis. A phase transition between decoding success and failure phases is found to coincide with an information-theoretic upper bound. The method is employed to compare Gallager and MN codes.
Resumo:
Statistical physics is employed to evaluate the performance of error-correcting codes in the case of finite message length for an ensemble of Gallager's error correcting codes. We follow Gallager's approach of upper-bounding the average decoding error rate, but invoke the replica method to reproduce the tightest general bound to date, and to improve on the most accurate zero-error noise level threshold reported in the literature. The relation between the methods used and those presented in the information theory literature are explored.
Resumo:
We analyse Gallager codes by employing a simple mean-field approximation that distorts the model geometry and preserves important interactions between sites. The method naturally recovers the probability propagation decoding algorithm as a minimization of a proper free-energy. We find a thermodynamical phase transition that coincides with information theoretical upper-bounds and explain the practical code performance in terms of the free-energy landscape.
Resumo:
We develop an approach for sparse representations of Gaussian Process (GP) models (which are Bayesian types of kernel machines) in order to overcome their limitations for large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the GP model. By using an appealing parametrisation and projection techniques that use the RKHS norm, recursions for the effective parameters and a sparse Gaussian approximation of the posterior process are obtained. This allows both for a propagation of predictions as well as of Bayesian error measures. The significance and robustness of our approach is demonstrated on a variety of experiments.
Resumo:
Using the magnetization enumerator method, we evaluate the practical and theoretical limitations of symmetric channels with real outputs. Results are presented for several regular Gallager code constructions.