906 resultados para Decision tree
Resumo:
With increasing interest shown by Universities in workplace learning, especially in STEM disciplines, an issue has arisen amongst educators and industry partners regarding authentic assessment tasks for work integrated learning (WIL) subjects. This paper describes the use of a matrix, which is also available as a decision-tree, based on the features of the WIL experience, in order to facilitate the selection of appropriate assessment strategies. The matrix divides the WIL experiences into seven categories, based on such factors as: the extent to which the experience is compulsory, required for membership of a professional body or elective; whether the student is undertaking a project, or embedding in a professional culture; and other key aspects of the WIL experience. One important variable is linked to the fundamental purpose of the assessment. This question revolves around the focus of the assessment: whether on the person (student development); the process (professional conduct/language); or the product (project, assignment, literature review, report, software). The matrix has been trialed at QUT in the Faculty of Science and Technology, and also at the University of Surrey, UK, and has proven to have good applicability in both universities.
Resumo:
Due to the demand for better and deeper analysis in sports, organizations (both professional teams and broadcasters) are looking to use spatiotemporal data in the form of player tracking information to obtain an advantage over their competitors. However, due to the large volume of data, its unstructured nature, and lack of associated team activity labels (e.g. strategic/tactical), effective and efficient strategies to deal with such data have yet to be deployed. A bottleneck restricting such solutions is the lack of a suitable representation (i.e. ordering of players) which is immune to the potentially infinite number of possible permutations of player orderings, in addition to the high dimensionality of temporal signal (e.g. a game of soccer last for 90 mins). Leveraging a recent method which utilizes a "role-representation", as well as a feature reduction strategy that uses a spatiotemporal bilinear basis model to form a compact spatiotemporal representation. Using this representation, we find the most likely formation patterns of a team associated with match events across nearly 14 hours of continuous player and ball tracking data in soccer. Additionally, we show that we can accurately segment a match into distinct game phases and detect highlights. (i.e. shots, corners, free-kicks, etc) completely automatically using a decision-tree formulation.
Resumo:
A cell classification algorithm that uses first, second and third order statistics of pixel intensity distributions over pre-defined regions is implemented and evaluated. A cell image is segmented into 6 regions extending from a boundary layer to an inner circle. First, second and third order statistical features are extracted from histograms of pixel intensities in these regions. Third order statistical features used are one-dimensional bispectral invariants. 108 features were considered as candidates for Adaboost based fusion. The best 10 stage fused classifier was selected for each class and a decision tree constructed for the 6-class problem. The classifier is robust, accurate and fast by design.
Resumo:
Age-related Macular Degeneration (AMD) is one of the major causes of vision loss and blindness in ageing population. Currently, there is no cure for AMD, however early detection and subsequent treatment may prevent the severe vision loss or slow the progression of the disease. AMD can be classified into two types: dry and wet AMDs. The people with macular degeneration are mostly affected by dry AMD. Early symptoms of AMD are formation of drusen and yellow pigmentation. These lesions are identified by manual inspection of fundus images by the ophthalmologists. It is a time consuming, tiresome process, and hence an automated diagnosis of AMD screening tool can aid clinicians in their diagnosis significantly. This study proposes an automated dry AMD detection system using various entropies (Shannon, Kapur, Renyi and Yager), Higher Order Spectra (HOS) bispectra features, Fractional Dimension (FD), and Gabor wavelet features extracted from greyscale fundus images. The features are ranked using t-test, Kullback–Lieber Divergence (KLD), Chernoff Bound and Bhattacharyya Distance (CBBD), Receiver Operating Characteristics (ROC) curve-based and Wilcoxon ranking methods in order to select optimum features and classified into normal and AMD classes using Naive Bayes (NB), k-Nearest Neighbour (k-NN), Probabilistic Neural Network (PNN), Decision Tree (DT) and Support Vector Machine (SVM) classifiers. The performance of the proposed system is evaluated using private (Kasturba Medical Hospital, Manipal, India), Automated Retinal Image Analysis (ARIA) and STructured Analysis of the Retina (STARE) datasets. The proposed system yielded the highest average classification accuracies of 90.19%, 95.07% and 95% with 42, 54 and 38 optimal ranked features using SVM classifier for private, ARIA and STARE datasets respectively. This automated AMD detection system can be used for mass fundus image screening and aid clinicians by making better use of their expertise on selected images that require further examination.
Resumo:
Environmental monitoring has become increasingly important due to the significant impact of human activities and climate change on biodiversity. Environmental sound sources such as rain and insect vocalizations are a rich and underexploited source of information in environmental audio recordings. This paper is concerned with the classification of rain within acoustic sensor re-cordings. We present the novel application of a set of features for classifying environmental acoustics: acoustic entropy, the acoustic complexity index, spectral cover, and background noise. In order to improve the performance of the rain classification system we automatically classify segments of environmental recordings into the classes of heavy rain or non-rain. A decision tree classifier is experientially compared with other classifiers. The experimental results show that our system is effective in classifying segments of environmental audio recordings with an accuracy of 93% for the binary classification of heavy rain/non-rain.
Resumo:
Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.
Resumo:
We describe an investigation into how Massey University’s Pollen Classifynder can accelerate the understanding of pollen and its role in nature. The Classifynder is an imaging microscopy system that can locate, image and classify slide based pollen samples. Given the laboriousness of purely manual image acquisition and identification it is vital to exploit assistive technologies like the Classifynder to enable acquisition and analysis of pollen samples. It is also vital that we understand the strengths and limitations of automated systems so that they can be used (and improved) to compliment the strengths and weaknesses of human analysts to the greatest extent possible. This article reviews some of our experiences with the Classifynder system and our exploration of alternative classifier models to enhance both accuracy and interpretability. Our experiments in the pollen analysis problem domain have been based on samples from the Australian National University’s pollen reference collection (2,890 grains, 15 species) and images bundled with the Classifynder system (400 grains, 4 species). These samples have been represented using the Classifynder image feature set.We additionally work through a real world case study where we assess the ability of the system to determine the pollen make-up of samples of New Zealand honey. In addition to the Classifynder’s native neural network classifier, we have evaluated linear discriminant, support vector machine, decision tree and random forest classifiers on these data with encouraging results. Our hope is that our findings will help enhance the performance of future releases of the Classifynder and other systems for accelerating the acquisition and analysis of pollen samples.
Resumo:
We describe an investigation into how Massey University's Pollen Classifynder can accelerate the understanding of pollen and its role in nature. The Classifynder is an imaging microscopy system that can locate, image and classify slide based pollen samples. Given the laboriousness of purely manual image acquisition and identification it is vital to exploit assistive technologies like the Classifynder to enable acquisition and analysis of pollen samples. It is also vital that we understand the strengths and limitations of automated systems so that they can be used (and improved) to compliment the strengths and weaknesses of human analysts to the greatest extent possible. This article reviews some of our experiences with the Classifynder system and our exploration of alternative classifier models to enhance both accuracy and interpretability. Our experiments in the pollen analysis problem domain have been based on samples from the Australian National University's pollen reference collection (2890 grains, 15 species) and images bundled with the Classifynder system (400 grains, 4 species). These samples have been represented using the Classifynder image feature set. In addition to the Classifynder's native neural network classifier, we have evaluated linear discriminant, support vector machine, decision tree and random forest classifiers on these data with encouraging results. Our hope is that our findings will help enhance the performance of future releases of the Classifynder and other systems for accelerating the acquisition and analysis of pollen samples. © 2013 AIP Publishing LLC.
Resumo:
The gross under-resourcing of conservation endeavours has placed an increasing emphasis on spending accountability. Increased accountability has led to monitoring forming a central element of conservation programs. Although there is little doubt that information obtained from monitoring can improve management of biodiversity, the cost (in time and/or money) of gaining this knowledge is rarely considered when making decisions about allocation of resources to monitoring. We present a simple framework allowing managers and policy advisors to make decisions about when to invest in monitoring to improve management. © 2010 Elsevier Ltd.
Resumo:
Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
- Objective To compare health service cost and length of stay between a traditional and an accelerated diagnostic approach to assess acute coronary syndromes (ACS) among patients who presented to the emergency department (ED) of a large tertiary hospital in Australia. - Design, setting and participants This historically controlled study analysed data collected from two independent patient cohorts presenting to the ED with potential ACS. The first cohort of 938 patients was recruited in 2008–2010, and these patients were assessed using the traditional diagnostic approach detailed in the national guideline. The second cohort of 921 patients was recruited in 2011–2013 and was assessed with the accelerated diagnostic approach named the Brisbane protocol. The Brisbane protocol applied early serial troponin testing for patients at 0 and 2 h after presentation to ED, in comparison with 0 and 6 h testing in traditional assessment process. The Brisbane protocol also defined a low-risk group of patients in whom no objective testing was performed. A decision tree model was used to compare the expected cost and length of stay in hospital between two approaches. Probabilistic sensitivity analysis was used to account for model uncertainty. - Results Compared with the traditional diagnostic approach, the Brisbane protocol was associated with reduced expected cost of $1229 (95% CI −$1266 to $5122) and reduced expected length of stay of 26 h (95% CI −14 to 136 h). The Brisbane protocol allowed physicians to discharge a higher proportion of low-risk and intermediate-risk patients from ED within 4 h (72% vs 51%). Results from sensitivity analysis suggested the Brisbane protocol had a high chance of being cost-saving and time-saving. - Conclusions This study provides some evidence of cost savings from a decision to adopt the Brisbane protocol. Benefits would arise for the hospital and for patients and their families.
Resumo:
Esophageal and gastroesophageal junction (GEJ) adenocarcinoma is rapidly increasing disease with a pathophysiology connected to oxidative stress. Exact pre-treatment clinical staging is essential for optimal care of this lethal malignancy. The cost-effectiviness of treatment is increasingly important. We measured oxidative metabolism in the distal and proximal esophagus by myeloperoxidase activity (MPA), glutathione content (GSH), and superoxide dismutase (SOD) in 20 patients operated on with Nissen fundoplication and 9 controls during a 4-year follow-up. Further, we assessed the oxidative damage of DNA by 8-hydroxydeoxyguanosine (8-OHdG) in esophageal samples of subjects (13 Barrett s metaplasia, 6 Barrett s esophagus with high-grade dysplasia, 18 adenocarcinoma of the distal esophagus/GEJ, and 14 normal controls). We estimated the accuracy (42 patients) and preoperative prognostic value (55 patients) of PET compared with computed tomography (CT) and endoscopic ultrasound (EUS) in patients with adenocarcinoma of the esophagus/GEJ. Finally, we clarified the specialty-related costs and the utility of either radical (30 patients) or palliative (23 patients) treatment of esophageal/GEJ carcinoma by the 15 D health-related quality-of-life (HRQoL) questionnaire and the survival rate. The cost-utility of radical treatment of esophageal/GEJ carcinoma was investigated using a decision tree analysis model comparing radical, palliative, and hypothetical new treatment. We found elevated oxidative stress ( measured by MPA) and decreased antioxidant defense (measured by GSH) after antireflux surgery. This indicates that antireflux surgery is not a perfect solution for oxidative stress of the esophageal mucosa. Elevated oxidative stress in turn may partly explain why adenocarcinoma of the distal esophagus is found even after successful fundoplication. In GERD patients, proximal esophageal mucosal anti-oxidative defense seems to be defective before and even years after successful antireflux surgery. In addition, antireflux surgery apparently does not change the level of oxidative stress in the proximal esophagus, suggesting that defective mucosal anti-oxidative capacity plays a role in development of oxidative damage to the esophageal mucosa in GERD. In the malignant transformation of Barrett s esophagus an important component appears to be oxidative stress. DNA damage may be mediated by 8-OHdG, which we found to be increased in Barrett s epithelium and in high-grade dysplasia as well as in adenocarcinoma of the esophagus/GEJ compared with controls. The entire esophagus of Barrett s patients suffers from increased oxidative stress ( measured by 8-OhdG). PET is a useful tool in the staging and prognostication of adenocarcinoma of the esophagus/GEJ detecting organ metastases better than CT, although its accuracy in staging of paratumoral and distant lymph nodes is limited. Radical surgery for esophageal/GEJ carcinoma provides the greatest benefit in terms of survival, and its cost-utility appears to be the best of currently available treatments.
Resumo:
Disengagement from services is common before suicide, hence identifying factors at treatment presentation that predict future suicidality is important. This article explores risk profiles for suicidal ideation among treatment seekers with depression and substance misuse. Participants completed assessments at baseline and 6 months. Baseline demographics, psychiatric history, and current symptoms were entered into a decision tree to predict suicidal ideation at follow-up. Sixty-three percent of participants at baseline and 43.5% at follow-up reported suicidal ideation. Baseline ideation most salient when psychiatric illness began before adulthood, increasing the rate of follow-up ideation by 16%. Among those without baseline ideation, dysfunctional attitudes were the most important risk factor, increasing rates of suicidal ideation by 35%. These findings provide evidence of factors beyond initial diagnoses that increase the likelihood of suicidal ideation and are worthy of clinical attention. In particular, providing suicide prevention resources to those with high dysfunctional attitudes may be beneficial.
Resumo:
Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in controlled environment is a practical proposition today. When the number of speakers is large (say 50–100), many of these schemes cannot be directly extended, as both recognition error and computation time increase monotonically with population size. The feature selection problem is also complex for such schemes. Though there were earlier attempts to rank order features based on statistical distance measures, it has been observed only recently that the best two independent measurements are not the same as the combination in two's for pattern classification. We propose here a systematic approach to the problem using the decision tree or hierarchical classifier with the following objectives: (1) Design of optimal policy at each node of the tree given the tree structure i.e., the tree skeleton and the features to be used at each node. (2) Determination of the optimal feature measurement and decision policy given only the tree skeleton. Applicability of optimization procedures such as dynamic programming in the design of such trees is studied. The experimental results deal with the design of a 50 speaker identification scheme based on this approach.