337 resultados para statistical classification
em Queensland University of Technology - ePrints Archive
Resumo:
We describe an investigation into how Massey University’s Pollen Classifynder can accelerate the understanding of pollen and its role in nature. The Classifynder is an imaging microscopy system that can locate, image and classify slide based pollen samples. Given the laboriousness of purely manual image acquisition and identification it is vital to exploit assistive technologies like the Classifynder to enable acquisition and analysis of pollen samples. It is also vital that we understand the strengths and limitations of automated systems so that they can be used (and improved) to compliment the strengths and weaknesses of human analysts to the greatest extent possible. This article reviews some of our experiences with the Classifynder system and our exploration of alternative classifier models to enhance both accuracy and interpretability. Our experiments in the pollen analysis problem domain have been based on samples from the Australian National University’s pollen reference collection (2,890 grains, 15 species) and images bundled with the Classifynder system (400 grains, 4 species). These samples have been represented using the Classifynder image feature set.We additionally work through a real world case study where we assess the ability of the system to determine the pollen make-up of samples of New Zealand honey. In addition to the Classifynder’s native neural network classifier, we have evaluated linear discriminant, support vector machine, decision tree and random forest classifiers on these data with encouraging results. Our hope is that our findings will help enhance the performance of future releases of the Classifynder and other systems for accelerating the acquisition and analysis of pollen samples.
Resumo:
Objective: The objectives of this article are to explore the extent to which the International Statistical Classification of Diseases and Related Health Problems (ICD) has been used in child abuse research, to describe how the ICD system has been applied and to assess factors affecting the reliability of ICD coded data in child abuse research.----- Methods: PubMed, CINAHL, PsychInfo and Google Scholar were searched for peer reviewed articles written since 1989 that used ICD as the classification system to identify cases and research child abuse using health databases. Snowballing strategies were also employed by searching the bibliographies of retrieved references to identify relevant associated articles. The papers identified through the search were independently screened by two authors for inclusion, resulting in 47 studies selected for the review. Due to heterogeneity of studies metaanalysis was not performed.----- Results: This paper highlights both utility and limitations of ICD coded data. ICD codes have been widely used to conduct research into child maltreatment in health data systems. The codes appear to be used primarily to determine child maltreatment patterns within identified diagnoses or to identify child maltreatment cases for research.----- Conclusions: A significant impediment to the use of ICD codes in child maltreatment research is the under-ascertainment of child maltreatment by using coded data alone. This is most clearly identified and, to some degree, quantified, in research where data linkage is used. Practice Implications: The importance of improved child maltreatment identification will assist in identifying risk factors and creating programs that can prevent and treat child maltreatment and assist in meeting reporting obligations under the CRC.
Resumo:
Background The implementation of the Australian Consumer Law in 2011 highlighted the need for better use of injury data to improve the effectiveness and responsiveness of product safety (PS) initiatives. In the PS system, resources are allocated to different priority issues using risk assessment tools. The rapid exchange of information (RAPEX) tool to prioritise hazards, developed by the European Commission, is currently being adopted in Australia. Injury data is required as a basic input to the RAPEX tool in the risk assessment process. One of the challenges in utilising injury data in the PS system is the complexity of translating detailed clinical coded data into broad categories such as those used in the RAPEX tool. Aims This study aims to translate hospital burns data into a simplified format by mapping the International Statistical Classification of Disease and Related Health Problems (Tenth Revision) Australian Modification (ICD-10-AM) burn codes into RAPEX severity rankings, using these rankings to identify priority areas in childhood product-related burns data. Methods ICD-10-AM burn codes were mapped into four levels of severity using the RAPEX guide table by assigning rankings from 1-4, in order of increasing severity. RAPEX rankings were determined by the thickness and surface area of the burn (BSA) with information extracted from the fourth character of T20-T30 codes for burn thickness, and the fourth and fifth characters of T31 codes for the BSA. Following the mapping process, secondary data analysis of 2008-2010 Queensland Hospital Admitted Patient Data Collection (QHAPDC) paediatric data was conducted to identify priority areas in product-related burns. Results The application of RAPEX rankings in QHAPDC burn data showed approximately 70% of paediatric burns in Queensland hospitals were categorised under RAPEX levels 1 and 2, 25% under RAPEX 3 and 4, with the remaining 5% unclassifiable. In the PS system, prioritisations are made to issues categorised under RAPEX levels 3 and 4. Analysis of external cause codes within these levels showed that flammable materials (for children aged 10-15yo) and hot substances (for children aged <2yo) were the most frequently identified products. Discussion and conclusions The mapping of ICD-10-AM burn codes into RAPEX rankings showed a favourable degree of compatibility between both classification systems, suggesting that ICD-10-AM coded burn data can be simplified to more effectively support PS initiatives. Additionally, the secondary data analysis showed that only 25% of all admitted burn cases in Queensland were severe enough to trigger a PS response.
Resumo:
Background The implementation of the Australian Consumer Law in 2011 highlighted the need for better use of injury data to improve the effectiveness and responsiveness of product safety (PS) initiatives. In the PS system, resources are allocated to different priority issues using risk assessment tools. The rapid exchange of information (RAPEX) tool to prioritise hazards, developed by the European Commission, is currently being adopted in Australia. Injury data is required as a basic input to the RAPEX tool in the risk assessment process. One of the challenges in utilising injury data in the PS system is the complexity of translating detailed clinical coded data into broad categories such as those used in the RAPEX tool. Aims This study aims to translate hospital burns data into a simplified format by mapping the International Statistical Classification of Disease and Related Health Problems (Tenth Revision) Australian Modification (ICD-10-AM) burn codes into RAPEX severity rankings, using these rankings to identify priority areas in childhood product-related burns data. Methods ICD-10-AM burn codes were mapped into four levels of severity using the RAPEX guide table by assigning rankings from 1-4, in order of increasing severity. RAPEX rankings were determined by the thickness and surface area of the burn (BSA) with information extracted from the fourth character of T20-T30 codes for burn thickness, and the fourth and fifth characters of T31 codes for the BSA. Following the mapping process, secondary data analysis of 2008-2010 Queensland Hospital Admitted Patient Data Collection (QHAPDC) paediatric data was conducted to identify priority areas in product-related burns. Results The application of RAPEX rankings in QHAPDC burn data showed approximately 70% of paediatric burns in Queensland hospitals were categorised under RAPEX levels 1 and 2, 25% under RAPEX 3 and 4, with the remaining 5% unclassifiable. In the PS system, prioritisations are made to issues categorised under RAPEX levels 3 and 4. Analysis of external cause codes within these levels showed that flammable materials (for children aged 10-15yo) and hot substances (for children aged <2yo) were the most frequently identified products. Discussion and conclusions The mapping of ICD-10-AM burn codes into RAPEX rankings showed a favourable degree of compatibility between both classification systems, suggesting that ICD-10-AM coded burn data can be simplified to more effectively support PS initiatives. Additionally, the secondary data analysis showed that only 25% of all admitted burn cases in Queensland were severe enough to trigger a PS response.
Resumo:
This paper describes the limitations of using the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) to characterise patient harm in hospitals. Limitations were identified during a project to use diagnoses flagged by Victorian coders as hospital-acquired to devise a classification of 144 categories of hospital acquired diagnoses (the Classification of Hospital Acquired Diagnoses or CHADx). CHADx is a comprehensive data monitoring system designed to allow hospitals to monitor their complication rates month-to-month using a standard method. Difficulties in identifying a single event from linear sequences of codes due to the absence of code linkage were the major obstacles to developing the classification. Obstetric and perinatal episodes also presented challenges in distinguishing condition onset, that is, whether conditions were present on admission or arose after formal admission to hospital. Used in the appropriate way, the CHADx allows hospitals to identify areas for future patient safety and quality initiatives. The value of timing information and code linkage should be recognised in the planning stages of any future electronic systems.
Resumo:
Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.
Resumo:
The use of appropriate features to characterise an output class or object is critical for all classification problems. In order to find optimal feature descriptors for vegetation species classification in a power line corridor monitoring application, this article evaluates the capability of several spectral and texture features. A new idea of spectral–texture feature descriptor is proposed by incorporating spectral vegetation indices in statistical moment features. The proposed method is evaluated against several classic texture feature descriptors. Object-based classification method is used and a support vector machine is employed as the benchmark classifier. Individual tree crowns are first detected and segmented from aerial images and different feature vectors are extracted to represent each tree crown. The experimental results showed that the proposed spectral moment features outperform or can at least compare with the state-of-the-art texture descriptors in terms of classification accuracy. A comprehensive quantitative evaluation using receiver operating characteristic space analysis further demonstrates the strength of the proposed feature descriptors.
Resumo:
Research has noted a ‘pronounced pattern of increase with increasing remoteness' of death rates in road crashes. However, crash characteristics by remoteness are not commonly or consistently reported, with definitions of rural and urban often relying on proxy representations such as prevailing speed limit. The current paper seeks to evaluate the efficacy of the Accessibility / Remoteness Index of Australia (ARIA+) to identifying trends in road crashes. ARIA+ does not rely on road-specific measures and uses distances to populated centres to attribute a score to an area, which can in turn be grouped into 5 classifications of increasing remoteness. The current paper uses applications of these classifications at the broad level of Australian Bureau of Statistics' Statistical Local Areas, thus avoiding precise crash locating or dedicated mapping software. Analyses used Queensland road crash database details for all 31,346 crashes resulting in a fatality or hospitalisation occurring between 1st July, 2001 and 30th June 2006 inclusive. Results showed that this simplified application of ARIA+ aligned with previous definitions such as speed limit, while also providing further delineation. Differences in crash contributing factors were noted with increasing remoteness such as a greater representation of alcohol and ‘excessive speed for circumstances.' Other factors such as the predominance of younger drivers in crashes differed little by remoteness classification. The results are discussed in terms of the utility of remoteness as a graduated rather than binary (rural/urban) construct and the potential for combining ARIA crash data with census and hospital datasets.
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
The use of appropriate features to characterize an output class or object is critical for all classification problems. This paper evaluates the capability of several spectral and texture features for object-based vegetation classification at the species level using airborne high resolution multispectral imagery. Image-objects as the basic classification unit were generated through image segmentation. Statistical moments extracted from original spectral bands and vegetation index image are used as feature descriptors for image objects (i.e. tree crowns). Several state-of-art texture descriptors such as Gray-Level Co-Occurrence Matrix (GLCM), Local Binary Patterns (LBP) and its extensions are also extracted for comparison purpose. Support Vector Machine (SVM) is employed for classification in the object-feature space. The experimental results showed that incorporating spectral vegetation indices can improve the classification accuracy and obtained better results than in original spectral bands, and using moments of Ratio Vegetation Index obtained the highest average classification accuracy in our experiment. The experiments also indicate that the spectral moment features also outperform or can at least compare with the state-of-art texture descriptors in terms of classification accuracy.
Resumo:
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.