933 resultados para proximity query, collision test, distance test, data compression, triangle test
Resumo:
While over-dispersion in capture–recapture studies is well known to lead to poor estimation of population size, current diagnostic tools to detect the presence of heterogeneity have not been specifically developed for capture–recapture studies. To address this, a simple and efficient method of testing for over-dispersion in zero-truncated count data is developed and evaluated. The proposed method generalizes an over-dispersion test previously suggested for un-truncated count data and may also be used for testing residual over-dispersion in zero-inflation data. Simulations suggest that the asymptotic distribution of the test statistic is standard normal and that this approximation is also reasonable for small sample sizes. The method is also shown to be more efficient than an existing test for over-dispersion adapted for the capture–recapture setting. Studies with zero-truncated and zero-inflated count data are used to illustrate the test procedures.
Resumo:
We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.
Resumo:
A score test is developed for binary clinical trial data, which incorporates patient non-compliance while respecting randomization. It is assumed in this paper that compliance is all-or-nothing, in the sense that a patient either accepts all of the treatment assigned as specified in the protocol, or none of it. Direct analytic comparisons of the adjusted test statistic for both the score test and the likelihood ratio test are made with the corresponding test statistics that adhere to the intention-to-treat principle. It is shown that no gain in power is possible over the intention-to-treat analysis, by adjusting for patient non-compliance. Sample size formulae are derived and simulation studies are used to demonstrate that the sample size approximation holds. Copyright © 2003 John Wiley & Sons, Ltd.
Resumo:
We describe a simple comparative method for determining whether rates of diversification are correlated with continuous traits in species-level phylogenies. This involves comparing traits of species with net speciation rate (number of nodes linking extant species with the root divided by the root to tip evolutionary distance), using a phylogenetically corrected correlation. We use simulations to examine the power of this test. We find that the approach has acceptable power to uncover relationships between speciation and a continuous trait and is robust to background random extinction; however, the power of the approach is reduced when the rate of trait evolution is decreased. The test has low power to relate diversification to traits when extinction rate is correlated with the trait. Clearly, there are inherent limitations in using only data on extant species to infer correlates of extinction; however, this approach is potentially a powerful tool in analyzing correlates of speciation.
Resumo:
Accurate knowledge of species’ habitat associations is important for conservation planning and policy. Assessing habitat associations is a vital precursor to selecting appropriate indicator species for prioritising sites for conservation or assessing trends in habitat quality. However, much existing knowledge is based on qualitative expert opinion or local scale studies, and may not remain accurate across different spatial scales or geographic locations. Data from biological recording schemes have the potential to provide objective measures of habitat association, with the ability to account for spatial variation. We used data on 50 British butterfly species as a test case to investigate the correspondence of data-derived measures of habitat association with expert opinion, from two different butterfly recording schemes. One scheme collected large quantities of occurrence data (c. 3 million records) and the other, lower quantities of standardised monitoring data (c. 1400 sites). We used general linear mixed effects models to derive scores of association with broad-leaf woodland for both datasets and compared them with scores canvassed from experts. Scores derived from occurrence and abundance data both showed strongly positive correlations with expert opinion. However, only for occurrence data did these fell within the range of correlations between experts. Data-derived scores showed regional spatial variation in the strength of butterfly associations with broad-leaf woodland, with a significant latitudinal trend in 26% of species. Sub-sampling of the data suggested a mean sample size of 5000 occurrence records per species to gain an accurate estimation of habitat association, although habitat specialists are likely to be readily detected using several hundred records. Occurrence data from recording schemes can thus provide easily obtained, objective, quantitative measures of habitat association.
Resumo:
The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.
Resumo:
A customer is presumed to gravitate to a facility by the distance to it and the attractiveness of it. However regarding the location of the facility, the presumption is that the customer opts for the shortest route to the nearest facility.This paradox was recently solved by the introduction of the gravity p-median model. The model is yet to be implemented and tested empirically. We implemented the model in an empirical problem of locating locksmiths, vehicle inspections, and retail stores ofv ehicle spare-parts, and we compared the solutions with those of the p-median model. We found the gravity p-median model to be of limited use for the problem of locating facilities as it either gives solutions similar to the p-median model, or it gives unstable solutions due to a non-concave objective function.