337 resultados para statistical classification
Exploring variation in measurement as a foundation for statistical thinking in the elementary school
Resumo:
This study was based on the premise that variation is the foundation of statistics and statistical investigations. The study followed the development of fourth-grade students' understanding of variation through participation in a sequence of two lessons based on measurement. In the first lesson all students measured the arm span of one student, revealing pathways students follow in developing understanding of variation and linear measurement (related to research question 1). In the second lesson each student's arm span was measured once, introducing a different aspect of variation for students to observe and contrast. From this second lesson, students' development of the ability to compare their representations for the two scenarios and explain differences in terms of variation was explored (research question 2). Students' documentation, in both workbook and software formats, enabled us to monitor their engagement and identify their increasing appreciation of the need to observe, represent, and contrast the variation in the data. Following the lessons, a written student assessment was used for judging retention of understanding of variation developed through the lessons and the degree of transfer of understanding to a different scenario (research question 3).
Resumo:
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Resumo:
We present an overview of the QUT plant classification system submitted to LifeCLEF 2014. This system uses generic features extracted from a convolutional neural network previously used to perform general object classification. We examine the effectiveness of these features to perform plant classification when used in combination with an extremely randomised forest. Using this system, with minimal tuning, we obtained relatively good results with a score of 0:249 on the test set of LifeCLEF 2014.
Resumo:
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
Resumo:
This thesis presents a promising boundary setting method for solving challenging issues in text classification to produce an effective text classifier. A classifier must identify boundary between classes optimally. However, after the features are selected, the boundary is still unclear with regard to mixed positive and negative documents. A classifier combination method to boost effectiveness of the classification model is also presented. The experiments carried out in the study demonstrate that the proposed classifier is promising.
Resumo:
Introduced in this paper is a Bayesian model for isolating the resonant frequency from combustion chamber resonance. The model shown in this paper focused on characterising the initial rise in the resonant frequency to investigate the rise of in-cylinder bulk temperature associated with combustion. By resolving the model parameters, it is possible to determine: the start of pre-mixed combustion, the start of diffusion combustion, the initial resonant frequency, the resonant frequency as a function of crank angle, the in-cylinder bulk temperature as a function of crank angle and the trapped mass as a function of crank angle. The Bayesian method allows for individual cycles to be examined without cycle-averaging|allowing inter-cycle variability studies. Results are shown for a turbo-charged, common-rail compression ignition engine run at 2000 rpm and full load.
Resumo:
Business processes are prone to continuous and unexpected changes. Process workers may start executing a process differently in order to adjust to changes in workload, season, guidelines or regulations for example. Early detection of business process changes based on their event logs – also known as business process drift detection – enables analysts to identify and act upon changes that may otherwise affect process performance. Previous methods for business process drift detection are based on an exploration of a potentially large feature space and in some cases they require users to manually identify the specific features that characterize the drift. Depending on the explored feature set, these methods may miss certain types of changes. This paper proposes a fully automated and statistically grounded method for detecting process drift. The core idea is to perform statistical tests over the distributions of runs observed in two consecutive time windows. By adaptively sizing the window, the method strikes a trade-off between classification accuracy and drift detection delay. A validation on synthetic and real-life logs shows that the method accurately detects typical change patterns and scales up to the extent it is applicable for online drift detection.
Resumo:
The monitoring of the actual activities of daily living of individuals with lower limb amputation is essential for an evidence-based fitting of the prosthesis, more particularly the choice of components (e.g., knees, ankles, feet)[1-4]. The purpose of this presentation was to give an overview of the categorization of the load regime data to assess the functional output and usage of the prosthesis of lower limb amputees has presented in several publications[5, 6]. The objectives were to present a categorization of load regime and to report the results for a case.
Resumo:
Background There is a need for better understanding of the dispersion of classification-related variable to develop an evidence-based classification of athletes with a disability participating in stationary throwing events. Objectives The purposes of this study are (A) to describe tools designed to comprehend and represent the dispersion of the performance between successive classes, and (B) to present this dispersion for the elite male and female stationary shot-putters who participated in Beijing 2008 Paralympic Games. Study design Retrospective study Methods This study analysed a total of 479 attempts performed by 114 male and female stationary shot-putters in three F30s (F32-F34) and six F50s (F52-F58) classes during the course of eight events during Beijing 2008 Paralympic Games. Results The average differences of best performance were 1.46±0.46 m for males between F54 and F58 classes as well as 1.06±1.18 m for females between F55 and F58 classes. The results demonstrated a linear relationship between best performance and classification while revealing two male Gold Medallists in F33 and F52 classes were outliers. Conclusions This study confirms the benefits of the comparative matrices, performance continuum and dispersion plots to comprehend classification-related variables. The work presented here represents a stepping stone into biomechanical analyses of stationary throwers, particularly on the eve of the London 2012 Paralympic Games where new evidences could be gathered.
Resumo:
The relationship between mathematics and statistical reasoning frequently receives comment (Vere-Jones 1995, Moore 1997); however most of the research into the area tends to focus on mathematics anxiety. Gnaldi (2003) showed that in a statistics course for psychologists, the statistical understanding of students at the end of the course depended on students’ basic numeracy, rather than the number or level of previous mathematics courses the student had undertaken. As part of a study into the development of statistical thinking at the interface between secondary and tertiary education, students enrolled in an introductory data analysis subject were assessed regarding their statistical reasoning, basic numeracy skills, mathematics background and attitudes towards statistics. This work reports on some key relationships between these factors and in particular the importance of numeracy to statistical reasoning.
Resumo:
The relationship between mathematics and statistical reasoning frequently receives comment (Vere-Jones 1995, Moore 1997); however most of the research into the area tends to focus on maths anxiety. Gnaldi (Gnaldi 2003) showed that in a statistics course for psychologists, the statistical understanding of students at the end of the course depended on students’ basic numeracy, rather than the number or level of previous mathematics courses the student had undertaken. As part of a study into the development of statistical thinking at the interface between secondary and tertiary education, students enrolled in an introductory data analysis subject were assessed regarding their statistical reasoning ability, basic numeracy skills and attitudes towards statistics. This work reports on the relationships between these factors and in particular the importance of numeracy to statistical reasoning.
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
This work explores the potential of Australian native plants as a source of second-generation biodiesel for internal combustion engines application. Biodiesels were evaluated from a number of non-edible oil seeds which are grow naturally in Queensland, Australia. The quality of the produced biodiesels has been investigated by several experimental and numerical methods. The research methodology and numerical model developed in this study can be used for a broad range of biodiesel feedstocks and for the future development of renewable native biodiesel in Australia.