864 resultados para Data mining methods
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
This accessible, practice-oriented and compact text provides a hands-on introduction to the principles of market research. Using the market research process as a framework, the authors explain how to collect and describe the necessary data and present the most important and frequently used quantitative analysis techniques, such as ANOVA, regression analysis, factor analysis, and cluster analysis. An explanation is provided of the theoretical choices a market researcher has to make with regard to each technique, as well as how these are translated into actions in IBM SPSS Statistics. This includes a discussion of what the outputs mean and how they should be interpreted from a market research perspective. Each chapter concludes with a case study that illustrates the process based on real-world data. A comprehensive web appendix includes additional analysis techniques, datasets, video files and case studies. Several mobile tags in the text allow readers to quickly browse related web content using a mobile device.
Resumo:
This article explains first, the reasons why a knowledge of statistics is necessary and describes the role that statistics plays in an experimental investigation. Second, the normal distribution is introduced which describes the natural variability shown by many measurements in optometry and vision sciences. Third, the application of the normal distribution to some common statistical problems including how to determine whether an individual observation is a typical member of a population and how to determine the confidence interval for a sample mean is described.
Resumo:
In this second article, statistical ideas are extended to the problem of testing whether there is a true difference between two samples of measurements. First, it will be shown that the difference between the means of two samples comes from a population of such differences which is normally distributed. Second, the 't' distribution, one of the most important in statistics, will be applied to a test of the difference between two means using a simple data set drawn from a clinical experiment in optometry. Third, in making a t-test, a statistical judgement is made as to whether there is a significant difference between the means of two samples. Before the widespread use of statistical software, this judgement was made with reference to a statistical table. Even if such tables are not used, it is useful to understand their logical structure and how to use them. Finally, the analysis of data, which are known to depart significantly from the normal distribution, will be described.
Resumo:
In some studies, the data are not measurements but comprise counts or frequencies of particular events. In such cases, an investigator may be interested in whether one specific event happens more frequently than another or whether an event occurs with a frequency predicted by a scientific model.
Resumo:
In any investigation in optometry involving more that two treatment or patient groups, an investigator should be using ANOVA to analyse the results assuming that the data conform reasonably well to the assumptions of the analysis. Ideally, specific null hypotheses should be built into the experiment from the start so that the treatments variation can be partitioned to test these effects directly. If 'post-hoc' tests are used, then an experimenter should examine the degree of protection offered by the test against the possibilities of making either a type 1 or a type 2 error. All experimenters should be aware of the complexity of ANOVA. The present article describes only one common form of the analysis, viz., that which applies to a single classification of the treatments in a randomised design. There are many different forms of the analysis each of which is appropriate to the analysis of a specific experimental design. The uses of some of the most common forms of ANOVA in optometry have been described in a further article. If in any doubt, an investigator should consult a statistician with experience of the analysis of experiments in optometry since once embarked upon an experiment with an unsuitable design, there may be little that a statistician can do to help.
Resumo:
1. Pearson's correlation coefficient only tests whether the data fit a linear model. With large numbers of observations, quite small values of r become significant and the X variable may only account for a minute proportion of the variance in Y. Hence, the value of r squared should always be calculated and included in a discussion of the significance of r. 2. The use of r assumes that a bivariate normal distribution is present and this assumption should be examined prior to the study. If Pearson's r is not appropriate, then a non-parametric correlation coefficient such as Spearman's rs may be used. 3. A significant correlation should not be interpreted as indicating causation especially in observational studies in which there is a high probability that the two variables are correlated because of their mutual correlations with other variables. 4. In studies of measurement error, there are problems in using r as a test of reliability and the ‘intra-class correlation coefficient’ should be used as an alternative. A correlation test provides only limited information as to the relationship between two variables. Fitting a regression line to the data using the method known as ‘least square’ provides much more information and the methods of regression and their application in optometry will be discussed in the next article.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.