7 resultados para automated text classification
Resumo:
The histological grading of cervical intraepithelial neoplasia (CIN) remains subjective, resulting in inter- and intra-observer variation and poor reproducibility in the grading of cervical lesions. This study has attempted to develop an objective grading system using automated machine vision. The architectural features of cervical squamous epithelium are quantitatively analysed using a combination of computerized digital image processing and Delaunay triangulation analysis; 230 images digitally captured from cases previously classified by a gynaecological pathologist included normal cervical squamous epithelium (n = 30), koilocytosis (n = 46), CIN 1 (n = 52), CIN 2 (n = 56), and CIN 3 (n=46). Intra- and inter-observer variation had kappa values of 0.502 and 0.415, respectively. A machine vision system was developed in KS400 macro programming language to segment and mark the centres of all nuclei within the epithelium. By object-oriented analysis of image components, the positional information of nuclei was used to construct a Delaunay triangulation mesh. Each mesh was analysed to compute triangle dimensions including the mean triangle area, the mean triangle edge length, and the number of triangles per unit area, giving an individual quantitative profile of measurements for each case. Discriminant analysis of the geometric data revealed the significant discriminatory variables from which a classification score was derived. The scoring system distinguished between normal and CIN 3 in 98.7% of cases and between koilocytosis and CIN 1 in 76.5% of cases, but only 62.3% of the CIN cases were classified into the correct group, with the CIN 2 group showing the highest rate of misclassification. Graphical plots of triangulation data demonstrated the continuum of morphological change from normal squamous epithelium to the highest grade of CIN, with overlapping of the groups originally defined by the pathologists. This study shows that automated location of nuclei in cervical biopsies using computerized image analysis is possible. Analysis of positional information enables quantitative evaluation of architectural features in CIN using Delaunay triangulation meshes, which is effective in the objective classification of CIN. This demonstrates the future potential of automated machine vision systems in diagnostic histopathology. Copyright (C) 2000 John Wiley and Sons, Ltd.
Resumo:
PURPOSE. To describe and classify patterns of abnormal fundus autofluorescence (FAF) in eyes with early nonexudative age-related macular disease (AMD). METHODS. FAF images were recorded in eyes with early AMD by confocal scanning laser ophthalmoscopy (cSLO) with excitation at 488 nm (argon or OPSL laser) and emission above 500 or 521 nm (barrier filter). A standardized protocol for image acquisition and generation of mean images after automated alignment was applied, and routine fundus photographs were obtained. FAF images were classified by two independent observers. The ? statistic was applied to assess intra- and interobserver variability. RESULTS. Alterations in FAF were classified into eight phenotypic patterns including normal, minimal change, focal increased, patchy, linear, lacelike, reticular, and speckled. Areas with abnormal increased or decreased FAF signals may or may not have corresponded to funduscopically visible alterations. For intraobserver variability, ? of observer I was 0.80 (95% confidence interval [CI]0.71-0.89) and of observer II, 0.74. (95% CI, 0.64-0.84). For interobserver variability, ? was 0.77 (95% CI, 0.67-0.87). CONCLUSIONS. Various phenotypic patterns of abnormal FAF can be identified with cSLO imaging. Distinct patterns may reflect heterogeneity at a cellular and molecular level in contrast to a nonspecific aging process. The results indicate that the classification system yields a relatively high degree of intra- and interobserver agreement. It may be applicable for determination of novel prognostic determinants in longitudinal natural history studies, for identification of genetic risk factors, and for monitoring of future therapeutic interventions to slow the progression of early AMD. Copyright © Association for Research in Vision and Ophthalmology.
Resumo:
Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large representative collection of newspaper text is fed through a prototype system. In contrast to previous work, the output is subjected to human testing to verify that the text has not been significantly compromised by the information hiding procedure, yielding a success rate of 96% and bandwidth of 0.3 bits per sentence. © 2007 SPIE-IS&T.
Resumo:
Pollen grains are microscopic so their identification and quantification has, for decades, depended upon human observers using light microscopes: a labour-intensive approach. Modern improvements in computing and imaging hardware and software now bring automation of pollen analyses within reach. In this paper, we provide the first review in over 15 yr of progress towards automation of the part of palynology concerned with counting and classifying pollen, bringing together literature published from a wide spectrum of sources. We
consider which attempts offer the most potential for an automated palynology system for universal application across all fields of research concerned with pollen classification and counting. We discuss what is required to make the datasets of these automated systems as acceptable as those produced by human palynologists, and present suggestions for how automation will generate novel approaches to counting and classifying pollen that have hitherto been unthinkable.
Resumo:
Morphological changes in the retinal vascular network are associated with future risk of many systemic and vascular diseases. However, uncertainty over the presence and nature of some of these associations exists. Analysis of data from large population based studies will help to resolve these uncertainties. The QUARTZ (QUantitative Analysis of Retinal vessel Topology and siZe) retinal image analysis system allows automated processing of large numbers of retinal images. However, an image quality assessment module is needed to achieve full automation. In this paper, we propose such an algorithm, which uses the segmented vessel map to determine the suitability of retinal images for use in the creation of vessel morphometric data suitable for epidemiological studies. This includes an effective 3-dimensional feature set and support vector machine classification. A random subset of 800 retinal images from UK Biobank (a large prospective study of 500,000 middle aged adults; where 68,151 underwent retinal imaging) was used to examine the performance of the image quality algorithm. The algorithm achieved a sensitivity of 95.33% and a specificity of 91.13% for the detection of inadequate images. The strong performance of this image quality algorithm will make rapid automated analysis of vascular morphometry feasible on the entire UK Biobank dataset (and other large retinal datasets), with minimal operator involvement, and at low cost.
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.
Resumo:
Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.