941 resultados para Automatic classification
Resumo:
A large number of methods have been published that aim to evaluate various components of multi-view geometry systems. Most of these have focused on the feature extraction, description and matching stages (the visual front end), since geometry computation can be evaluated through simulation. Many data sets are constrained to small scale scenes or planar scenes that are not challenging to new algorithms, or require special equipment. This paper presents a method for automatically generating geometry ground truth and challenging test cases from high spatio-temporal resolution video. The objective of the system is to enable data collection at any physical scale, in any location and in various parts of the electromagnetic spectrum. The data generation process consists of collecting high resolution video, computing accurate sparse 3D reconstruction, video frame culling and down sampling, and test case selection. The evaluation process consists of applying a test 2-view geometry method to every test case and comparing the results to the ground truth. This system facilitates the evaluation of the whole geometry computation process or any part thereof against data compatible with a realistic application. A collection of example data sets and evaluations is included to demonstrate the range of applications of the proposed system.
Resumo:
The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.
Resumo:
This work aims at developing a planetary rover capable of acting as an assistant astrobiologist: making a preliminary analysis of the collected visual images that will help to make better use of the scientists time by pointing out the most interesting pieces of data. This paper focuses on the problem of detecting and recognising particular types of stromatolites. Inspired by the processes actual astrobiologists go through in the field when identifying stromatolites, the processes we investigate focus on recognising characteristics associated with biogenicity. The extraction of these characteristics is based on the analysis of geometrical structure enhanced by passing the images of stromatolites into an edge-detection filter and its Fourier Transform, revealing typical spatial frequency patterns. The proposed analysis is performed on both simulated images of stromatolite structures and images of real stromatolites taken in the field by astrobiologists.
Resumo:
Camera-laser calibration is necessary for many robotics and computer vision applications. However, existing calibration toolboxes still require laborious effort from the operator in order to achieve reliable and accurate results. This paper proposes algorithms that augment two existing trustful calibration methods with an automatic extraction of the calibration object from the sensor data. The result is a complete procedure that allows for automatic camera-laser calibration. The first stage of the procedure is automatic camera calibration which is useful in its own right for many applications. The chessboard extraction algorithm it provides is shown to outperform openly available techniques. The second stage completes the procedure by providing automatic camera-laser calibration. The procedure has been verified by extensive experimental tests with the proposed algorithms providing a major reduction in time required from an operator in comparison to manual methods.
Resumo:
Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.
Resumo:
A cell classification algorithm that uses first, second and third order statistics of pixel intensity distributions over pre-defined regions is implemented and evaluated. A cell image is segmented into 6 regions extending from a boundary layer to an inner circle. First, second and third order statistical features are extracted from histograms of pixel intensities in these regions. Third order statistical features used are one-dimensional bispectral invariants. 108 features were considered as candidates for Adaboost based fusion. The best 10 stage fused classifier was selected for each class and a decision tree constructed for the 6-class problem. The classifier is robust, accurate and fast by design.
Resumo:
A long query provides more useful hints for searching relevant documents, but it is likely to introduce noise which affects retrieval performance. In order to smooth such adverse effect, it is important to reduce noisy terms, introduce and boost additional relevant terms. This paper presents a comprehensive framework, called Aspect Hidden Markov Model (AHMM), which integrates query reduction and expansion, for retrieval with long queries. It optimizes the probability distribution of query terms by utilizing intra-query term dependencies as well as the relationships between query terms and words observed in relevance feedback documents. Empirical evaluation on three large-scale TREC collections demonstrates that our approach, which is automatic, achieves salient improvements over various strong baselines, and also reaches a comparable performance to a state of the art method based on user’s interactive query term reduction and expansion.
Resumo:
Real-time image analysis and classification onboard robotic marine vehicles, such as AUVs, is a key step in the realisation of adaptive mission planning for large-scale habitat mapping in previously unexplored environments. This paper describes a novel technique to train, process, and classify images collected onboard an AUV used in relatively shallow waters with poor visibility and non-uniform lighting. The approach utilises Förstner feature detectors and Laws texture energy masks for image characterisation, and a bag of words approach for feature recognition. To improve classification performance we propose a usefulness gain to learn the importance of each histogram component for each class. Experimental results illustrate the performance of the system in characterisation of a variety of marine habitats and its ability to operate onboard an AUV's main processor suitable for real-time mission planning.
Resumo:
This is a discussion of the journal article: "Construcing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation". The article and discussion have appeared in the Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Resumo:
The aim of this research is to report initial experimental results and evaluation of a clinician-driven automated method that can address the issue of misdiagnosis from unstructured radiology reports. Timely diagnosis and reporting of patient symptoms in hospital emergency departments (ED) is a critical component of health services delivery. However, due to disperse information resources and vast amounts of manual processing of unstructured information, a point-of-care accurate diagnosis is often difficult. A rule-based method that considers the occurrence of clinician specified keywords related to radiological findings was developed to identify limb abnormalities, such as fractures. A dataset containing 99 narrative reports of radiological findings was sourced from a tertiary hospital. The rule-based method achieved an F-measure of 0.80 and an accuracy of 0.80. While our method achieves promising performance, a number of avenues for improvement were identified using advanced natural language processing (NLP) techniques.
Resumo:
We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
Aims Pathology notification for a Cancer Registry is regarded as the most valid information for the confirmation of a diagnosis of cancer. In view of the importance of pathology data, an automatic medical text analysis system (Medtex) is being developed to perform electronic Cancer Registry data extraction and coding of important clinical information embedded within pathology reports. Methods The system automatically scans HL7 messages received from a Queensland pathology information system and analyses the reports for terms and concepts relevant to a cancer notification. A multitude of data items for cancer notification such as primary site, histological type, stage, and other synoptic data are classified by the system. The underlying extraction and classification technology is based on SNOMED CT1 2. The Queensland Cancer Registry business rules3 and International Classification of Diseases – Oncology – Version 34 have been incorporated. Results The cancer notification services show that the classification of notifiable reports can be achieved with sensitivities of 98% and specificities of 96%5, while the coding of cancer notification items such as basis of diagnosis, histological type and grade, primary site and laterality can be extracted with an overall accuracy of 80%6. In the case of lung cancer staging, the automated stages produced were accurate enough for the purposes of population level research and indicative staging prior to multi-disciplinary team meetings2 7. Medtex also allows for detailed tumour stream synoptic reporting8. Conclusions Medtex demonstrates how medical free-text processing could enable the automation of some Cancer Registry processes. Over 70% of Cancer Registry coding resources are devoted to information acquisition. The development of a clinical decision support system to unlock information from medical free-text could significantly reduce costs arising from duplicated processes and enable improved decision support, enhancing efficiency and timeliness of cancer information for Cancer Registries.
Resumo:
Spatially-explicit modelling of grassland classes is important to site-specific planning for improving grassland and environmental management over large areas. In this study, a climate-based grassland classification model, the Comprehensive and Sequential Classification System (CSCS) was integrated with spatially interpolated climate data to classify grassland in Gansu province, China. The study area is characterized by complex topographic features imposed by plateaus, high mountains, basins and deserts. To improve the quality of the interpolated climate data and the quality of the spatial classification over this complex topography, three linear regression methods, namely an analytic method based on multiple regression and residues (AMMRR), a modification of the AMMRR method through adding the effect of slope and aspect to the interpolation analysis (M-AMMRR) and a method which replaces the IDW approach for residue interpolation in M-AMMRR with an ordinary kriging approach (I-AMMRR), for interpolating climate variables were evaluated. The interpolation outcomes from the best interpolation method were then used in the CSCS model to classify the grassland in the study area. Climate variables interpolated included the annual cumulative temperature and annual total precipitation. The results indicated that the AMMRR and M-AMMRR methods generated acceptable climate surfaces but the best model fit and cross validation result were achieved by the I-AMMRR method. Twenty-six grassland classes were classified for the study area. The four grassland vegetation classes that covered more than half of the total study area were "cool temperate-arid temperate zonal semi-desert", "cool temperate-humid forest steppe and deciduous broad-leaved forest", "temperate-extra-arid temperate zonal desert", and "frigid per-humid rain tundra and alpine meadow". The vegetation classification map generated in this study provides spatial information on the locations and extents of the different grassland classes. This information can be used to facilitate government agencies' decision-making in land-use planning and environmental management, and for vegetation and biodiversity conservation. The information can also be used to assist land managers in the estimation of safe carrying capacities which will help to prevent overgrazing and land degradation.