904 resultados para Random coefficient multinomial logit
Resumo:
Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
[1] Cloud cover is conventionally estimated from satellite images as the observed fraction of cloudy pixels. Active instruments such as radar and Lidar observe in narrow transects that sample only a small percentage of the area over which the cloud fraction is estimated. As a consequence, the fraction estimate has an associated sampling uncertainty, which usually remains unspecified. This paper extends a Bayesian method of cloud fraction estimation, which also provides an analytical estimate of the sampling error. This method is applied to test the sensitivity of this error to sampling characteristics, such as the number of observed transects and the variability of the underlying cloud field. The dependence of the uncertainty on these characteristics is investigated using synthetic data simulated to have properties closely resembling observations of the spaceborne Lidar NASA-LITE mission. Results suggest that the variance of the cloud fraction is greatest for medium cloud cover and least when conditions are mostly cloudy or clear. However, there is a bias in the estimation, which is greatest around 25% and 75% cloud cover. The sampling uncertainty is also affected by the mean lengths of clouds and of clear intervals; shorter lengths decrease uncertainty, primarily because there are more cloud observations in a transect of a given length. Uncertainty also falls with increasing number of transects. Therefore a sampling strategy aimed at minimizing the uncertainty in transect derived cloud fraction will have to take into account both the cloud and clear sky length distributions as well as the cloud fraction of the observed field. These conclusions have implications for the design of future satellite missions. This paper describes the first integrated methodology for the analytical assessment of sampling uncertainty in cloud fraction observations from forthcoming spaceborne radar and Lidar missions such as NASA's Calipso and CloudSat.
Resumo:
A parallel hardware random number generator for use with a VLSI genetic algorithm processing device is proposed. The design uses an systolic array of mixed congruential random number generators. The generators are constantly reseeded with the outputs of the proceeding generators to avoid significant biasing of the randomness of the array which would result in longer times for the algorithm to converge to a solution. 1 Introduction In recent years there has been a growing interest in developing hardware genetic algorithm devices [1, 2, 3]. A genetic algorithm (GA) is a stochastic search and optimization technique which attempts to capture the power of natural selection by evolving a population of candidate solutions by a process of selection and reproduction [4]. In keeping with the evolutionary analogy, the solutions are called chromosomes with each chromosome containing a number of genes. Chromosomes are commonly simple binary strings, the bits being the genes.
Resumo:
We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.
Resumo:
The purpose of this study was to improve the prediction of the quantity and type of Volatile Fatty Acids (VFA) produced from fermented substrate in the rumen of lactating cows. A model was formulated that describes the conversion of substrate (soluble carbohydrates, starch, hemi-cellulose, cellulose, and protein) into VFA (acetate, propionate, butyrate, and other VFA). Inputs to the model were observed rates of true rumen digestion of substrates, whereas outputs were observed molar proportions of VFA in rumen fluid. A literature survey generated data of 182 diets (96 roughage and 86 concentrate diets). Coefficient values that define the conversion of a specific substrate into VFA were estimated meta-analytically by regression of the model against observed VFA molar proportions using non-linear regression techniques. Coefficient estimates significantly differed for acetate and propionate production in particular, between different types of substrate and between roughage and concentrate diets. Deviations of fitted from observed VFA molar proportions could be attributed to random error for 100%. In addition to regression against observed data, simulation studies were performed to investigate the potential of the estimation method. Fitted coefficient estimates from simulated data sets appeared accurate, as well as fitted rates of VFA production, although the model accounted for only a small fraction (maximally 45%) of the variation in VFA molar proportions. The simulation results showed that the latter result was merely a consequence of the statistical analysis chosen and should not be interpreted as an indication of inaccuracy of coefficient estimates. Deviations between fitted and observed values corresponded to those obtained in simulations. (c) 2005 Elsevier Ltd. All rights reserved.
Hydrolyzable tannin structures influence relative globular and random coil protein binding strengths
Resumo:
Binding parameters for the interactions of pentagalloyl glucose (PGG) and four hydrolyzable tannins (representing gallotannins and ellagitannins) with gelatin and bovine serum albumin (BSA) have been determined from isothermal titration calorimetry data. Equilibrium binding constants determined for the interaction of PGG and isolated mixtures of tara gallotannins and of sumac gallotannins with gelatin and BSA were of the same order of magnitude for each tannin (in the range of 10(4)-10(5) M-1 for stronger binding sites when using a binding model consisting of two sets of multiple binding sites). In contrast, isolated mixtures of chestnut ellagitannins and of myrabolan ellagitannins exhibited 3-4 orders of magnitude greater equilibrium binding constants for the interaction with gelatin (similar to 2 x 10(6) M-1) than for that with BSA (similar to 8 x 10(2) M-1). Binding stoichiometries revealed that the stronger binding sites on gelatin outnumbered those on BSA by a ratio of at least similar to 2:1 for all of the hydrolyzable tannins studied. Overall, the data revealed that relative binding constants for the interactions with gelatin and BSA are dependent on the structural flexibility of the tannin molecule.
Resumo:
Grass-based diets are of increasing social-economic importance in dairy cattle farming, but their low supply of glucogenic nutrients may limit the production of milk. Current evaluation systems that assess the energy supply and requirements are based on metabolisable energy (ME) or net energy (NE). These systems do not consider the characteristics of the energy delivering nutrients. In contrast, mechanistic models take into account the site of digestion, the type of nutrient absorbed and the type of nutrient required for production of milk constituents, and may therefore give a better prediction of supply and requirement of nutrients. The objective of the present study is to compare the ability of three energy evaluation systems, viz. the Dutch NE system, the agricultural and food research council (AFRC) ME system, and the feed into milk (FIM) ME system, and of a mechanistic model based on Dijkstra et al. [Simulation of digestion in cattle fed sugar cane: prediction of nutrient supply for milk production with locally available supplements. J. Agric. Sci., Cambridge 127, 247-60] and Mills et al. [A mechanistic model of whole-tract digestion and methanogenesis in the lactating dairy cow: model development, evaluation and application. J. Anim. Sci. 79, 1584-97] to predict the feed value of grass-based diets for milk production. The dataset for evaluation consists of 41 treatments of grass-based diets (at least 0.75 g ryegrass/g diet on DM basis). For each model, the predicted energy or nutrient supply, based on observed intake, was compared with predicted requirement based on observed performance. Assessment of the error of energy or nutrient supply relative to requirement is made by calculation of mean square prediction error (MSPE) and by concordance correlation coefficient (CCC). All energy evaluation systems predicted energy requirement to be lower (6-11%) than energy supply. The root MSPE (expressed as a proportion of the supply) was lowest for the mechanistic model (0.061), followed by the Dutch NE system (0.082), FIM ME system (0.097) and AFRCME system(0.118). For the energy evaluation systems, the error due to overall bias of prediction dominated the MSPE, whereas for the mechanistic model, proportionally 0.76 of MSPE was due to random variation. CCC analysis confirmed the higher accuracy and precision of the mechanistic model compared with energy evaluation systems. The error of prediction was positively related to grass protein content for the Dutch NE system, and was also positively related to grass DMI level for all models. In conclusion, current energy evaluation systems overestimate energy supply relative to energy requirement on grass-based diets for dairy cattle. The mechanistic model predicted glucogenic nutrients to limit performance of dairy cattle on grass-based diets, and proved to be more accurate and precise than the energy systems. The mechanistic model could be improved by allowing glucose maintenance and utilization requirements parameters to be variable. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Cedrus atlantica (Pinaceae) is a large and exceptionally long-lived conifer native to the Rif and Atlas Mountains of North Africa. To assess levels and patterns of genetic diversity of this species. samples were obtained throughout the natural range in Morocco and from a forest plantation in Arbucies, Girona (Spain) and analyzed using RAPD markers. Within-population genetic diversity was high and comparable to that revealed by isozymes. Managed populations harbored levels of genetic variation similar to those found in their natural counterparts. Genotypic analyses Of Molecular variance (AMOVA) found that most variation was within populations. but significant differentiation was also found between populations. particularly in Morocco. Bayesian estimates of F,, corroborated the AMOVA partitioning and provided evidence for Population differentiation in C. atlantica. Both distance- and Bayesian-based Clustering methods revealed that Moroccan populations comprise two genetically distinct groups. Within each group, estimates of population differentiation were close to those previously reported in other gymnosperms. These results are interpreted in the context of the postglacial history of the species and human impact. The high degree of among-group differentiation recorded here highlights the need for additional conservation measures for some Moroccan Populations of C. atlantica.
Resumo:
Using mixed logit models to analyse choice data is common but requires ex ante specification of the functional forms of preference distributions. We make the case for greater use of bounded functional forms and propose the use of the Marginal Likelihood, calculated using Bayesian techniques, as a single measure of model performance across non nested mixed logit specifications. Using this measure leads to very different rankings of model specifications compared to alternative rule of thumb measures. The approach is illustrated using data from a choice experiment regarding GM food types which provides insights regarding the recent WTO dispute between the EU and the US, Canada and Argentina and whether labelling and trade regimes should be based on the production process or product composition.