820 resultados para Data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Urbanization and the ability to manage for a sustainable future present numerous challenges for geographers and planners in metropolitan regions. Remotely sensed data are inherently suited to provide information on urban land cover characteristics, and their change over time, at various spatial and temporal scales. Data models for establishing the range of urban land cover types and their biophysical composition (vegetation, soil, and impervious surfaces) are integrated to provide a hierarchical approach to classifying land cover within urban environments. These data also provide an essential component for current simulation models of urban growth patterns, as both calibration and validation data. The first stages of the approach have been applied to examine urban growth between 1988 and 1995 for a rapidly developing area in southeast Queensland, Australia. Landsat Thematic Mapper image data provided accurate (83% adjusted overall accuracy) classification of broad land cover types and their change over time. The combination of commonly available remotely sensed data, image processing methods, and emerging urban growth models highlights an important application for current and next generation moderate spatial resolution image data in studies of urban environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In studies assessing the trends in coronary events, such as the World Health Organization (WHO) MONICA Project (multinational MONItoring of trends and determinants of CArdiovascular disease), the main emphasis has been on coronary deaths and non-fatal definite myocardial infarctions (MI). It is, however, possible that the proportion of milder MIs may be increasing because of improvements in treatment and reductions in levels of risk factors. We used the MI register data of the WHO MONICA Project to investigate several definitions for mild non-fatal MIs that would be applicable in various settings and could be used to assess trends in milder coronary events. Of 38 populations participating in the WHO MONICA MI register study, more than half registered a sufficiently wide spectrum of events that it was possible to identify subsets of milder cases. The event rates and case fatality rates of MI are clearly dependent on the spectrum of non-fatal MIs, which are included. On clinical grounds we propose that the original MONICA category ''non-fatal possible MI'' could bt:divided into two groups: ''non fatal probable MI'' and ''prolonged chest pain.'' Non-fatal probable MIs are cases, which in addition to ''typical symptoms'' have electrocardiogram (EGG) or enzyme changes suggesting cardiac ischemia, but not severe enough to fulfil the criteria for non-fatal definite MI In more than half of the MONICA Collaborating Centers, the registration of MI covers these milder events reasonably well. Proportions of non-fatal probable MIs vary less between populations than do proportions of non fatal possible MIs. Also rates of non-fatal probable MI are somewhat more highly correlated with rates of fatal events and non-fatal definite MI. These findings support the validity of the category of non-fatal probable MI. In each center the increase in event rates and the decrease in case-fatality due to the inclusion of non-fatal probable MI was lar er for women than men. For the WHO MONICA Project and other epidemiological studies the proposed category of non-fatal probable MIs can be used for assessing trends in rates of milder MI. Copyright (C) 1997 Elsevier Science Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The traditional methods employed to detect atherosclerotic lesions allow for the identification of lesions; however, they do not provide specific characterization of the lesion`s biochemistry. Currently, Raman spectroscopy techniques are widely used as a characterization method for unknown substances, which makes this technique very important for detecting atherosclerotic lesions. The spectral interpretation is based on the analysis of frequency peaks present in the signal; however, spectra obtained from the same substance can show peaks slightly different and these differences make difficult the creation of an automatic method for spectral signal analysis. This paper presents a signal analysis method based on a clustering technique that allows for the classification of spectra as well as the inference of a diagnosis about the arterial wall condition. The objective is to develop a computational tool that is able to create clusters of spectra according to the arterial wall state and, after data collection, to allow for the classification of a specific spectrum into its correct cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Optical diagnostic methods, such as near-infrared Raman spectroscopy allow quantification and evaluation of human affecting diseases, which could be useful in identifying and diagnosing atherosclerosis in coronary arteries. The goal of the present work is to apply Independent Component Analysis (ICA) for data reduction and feature extraction of Raman spectra and to perform the Mahalanobis distance for group classification according to histopathology, obtaining feasible diagnostic information to detect atheromatous plaque. An 830nm Ti:sapphire laser pumped by an argon laser provides near-infrared excitation. A spectrograph disperses light scattered from arterial tissues over a liquid-nitrogen cooled CCD to detect the Raman spectra. A total of 111 spectra from arterial fragments were utilized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The collection of spatial information to quantify changes to the state and condition of the environment is a fundamental component of conservation or sustainable utilization of tropical and subtropical forests, Age is an important structural attribute of old-growth forests influencing biological diversity in Australia eucalypt forests. Aerial photograph interpretation has traditionally been used for mapping the age and structure of forest stands. However this method is subjective and is not able to accurately capture fine to landscape scale variation necessary for ecological studies. Identification and mapping of fine to landscape scale vegetative structural attributes will allow the compilation of information associated with Montreal Process indicators lb and ld, which seek to determine linkages between age structure and the diversity and abundance of forest fauna populations. This project integrated measurements of structural attributes derived from a canopy-height elevation model with results from a geometrical-optical/spectral mixture analysis model to map forest age structure at a landscape scale. The availability of multiple-scale data allows the transfer of high-resolution attributes to landscape scale monitoring. Multispectral image data were obtained from a DMSV (Digital Multi-Spectral Video) sensor over St Mary's State Forest in Southeast Queensland, Australia. Local scene variance levels for different forest tapes calculated from the DMSV data were used to optimize the tree density and canopy size output in a geometric-optical model applied to a Landsat Thematic Mapper (TU) data set. Airborne laser scanner data obtained over the project area were used to calibrate a digital filter to extract tree heights from a digital elevation model that was derived from scanned colour stereopairs. The modelled estimates of tree height, crown size, and tree density were used to produce a decision-tree classification of forest successional stage at a landscape scale. The results obtained (72% accuracy), were limited in validation, but demonstrate potential for using the multi-scale methodology to provide spatial information for forestry policy objectives (ie., monitoring forest age structure).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Complete small subunit ribosomal RNA gene (ssrDNA) and partial (D1-D3) large subunit ribosomal RNA gene (lsrDNA) sequences were used to estimate the phylogeny of the Digenea via maximum parsimony and Bayesian inference. Here we contribute 80 new ssrDNA and 124 new lsrDNA sequences. Fully complementary data sets of the two genes were assembled from newly generated and previously published sequences and comprised 163 digenean taxa representing 77 nominal families and seven aspidogastrean outgroup taxa representing three families. Analyses were conducted on the genes independently as well as combined and separate analyses including only the higher plagiorchiidan taxa were performed using a reduced-taxon alignment including additional characters that could not be otherwise unambiguously aligned. The combined data analyses yielded the most strongly supported results and differences between the two methods of analysis were primarily in their degree of resolution. The Bayesian analysis including all taxa and characters, and incorporating a model of nucleotide substitution (general-time-reversible with among-site rate heterogeneity), was considered the best estimate of the phylogeny and was used to evaluate their classification and evolution. In broad terms, the Digenea forms a dichotomy that is split between a lineage leading to the Brachylaimoidea, Diplostomoidea and Schistosomatoidea (collectively the Diplostomida nomen novum (nom. nov.)) and the remainder of the Digenea (the Plagiorchiida), in which the Bivesiculata nom. nov. and Transversotremata nom. nov. form the two most basal lineages, followed by the Hemiurata. The remainder of the Plagiorchiida forms a large number of independent lineages leading to the crown clade Xiphidiata nom. nov. that comprises the Allocreadioidea, Gorgoderoidea, Microphalloidea and Plagiorchioidea, which are united by the presence of a penetrating stylet in their cercariae. Although a majority of families and to a lesser degree, superfamilies are supported as currently defined, the traditional divisions of the Echinostomida, Plagiorchiida and Strigeida were found to comprise non-natural assemblages. Therefore, the membership of established higher taxa are emended, new taxa erected and a revised, phylogenetically based classification proposed and discussed in light of ontogeny, morphology and taxonomic history. (C) 2003 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT The objective of this work was to study the distribution of values of the coefficient of variation (CV) in the experiments of papaya crop (Carica papaya L.) by proposing ranges to guide researchers in their evaluation for different characters in the field. The data used in this study were obtained by bibliographical review in Brazilian journals, dissertations and thesis. This study considered the following characters: diameter of the stalk, insertion height of the first fruit, plant height, number of fruits per plant, fruit biomass, fruit length, equatorial diameter of the fruit, pulp thickness, fruit firmness, soluble solids and internal cavity diameter, from which, value ranges were obtained for the CV values for each character, based on the methodology proposed by Garcia, Costa and by the standard classification of Pimentel-Gomes. The results obtained in this study indicated that ranges of CV values were different among various characters, presenting a large variation, which justifies the necessity of using specific evaluation range for each character. In addition, the use of classification ranges obtained from methodology of Costa is recommended.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: The correct identification of the underlying cause of death and its precise assignment to a code from the International Classification of Diseases are important issues to achieve accurate and universally comparable mortality statistics These factors, among other ones, led to the development of computer software programs in order to automatically identify the underlying cause of death. OBJECTIVE: This work was conceived to compare the underlying causes of death processed respectively by the Automated Classification of Medical Entities (ACME) and the "Sistema de Seleção de Causa Básica de Morte" (SCB) programs. MATERIAL AND METHOD: The comparative evaluation of the underlying causes of death processed respectively by ACME and SCB systems was performed using the input data file for the ACME system that included deaths which occurred in the State of S. Paulo from June to December 1993, totalling 129,104 records of the corresponding death certificates. The differences between underlying causes selected by ACME and SCB systems verified in the month of June, when considered as SCB errors, were used to correct and improve SCB processing logic and its decision tables. RESULTS: The processing of the underlying causes of death by the ACME and SCB systems resulted in 3,278 differences, that were analysed and ascribed to lack of answer to dialogue boxes during processing, to deaths due to human immunodeficiency virus [HIV] disease for which there was no specific provision in any of the systems, to coding and/or keying errors and to actual problems. The detailed analysis of these latter disclosed that the majority of the underlying causes of death processed by the SCB system were correct and that different interpretations were given to the mortality coding rules by each system, that some particular problems could not be explained with the available documentation and that a smaller proportion of problems were identified as SCB errors. CONCLUSION: These results, disclosing a very low and insignificant number of actual problems, guarantees the use of the version of the SCB system for the Ninth Revision of the International Classification of Diseases and assures the continuity of the work which is being undertaken for the Tenth Revision version.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper deals with the establishment of a characterization methodology of electric power profiles of medium voltage (MV) consumers. The characterization is supported on the data base knowledge discovery process (KDD). Data Mining techniques are used with the purpose of obtaining typical load profiles of MV customers and specific knowledge of their customers’ consumption habits. In order to form the different customers’ classes and to find a set of representative consumption patterns, a hierarchical clustering algorithm and a clustering ensemble combination approach (WEACS) are used. Taking into account the typical consumption profile of the class to which the customers belong, new tariff options were defined and new energy coefficients prices were proposed. Finally, and with the results obtained, the consequences that these will have in the interaction between customer and electric power suppliers are analyzed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The introduction of Electric Vehicles (EVs) together with the implementation of smart grids will raise new challenges to power system operators. This paper proposes a demand response program for electric vehicle users which provides the network operator with another useful resource that consists in reducing vehicles charging necessities. This demand response program enables vehicle users to get some profit by agreeing to reduce their travel necessities and minimum battery level requirements on a given period. To support network operator actions, the amount of demand response usage can be estimated using data mining techniques applied to a database containing a large set of operation scenarios. The paper includes a case study based on simulated operation scenarios that consider different operation conditions, e.g. available renewable generation, and considering a diversity of distributed resources and electric vehicles with vehicle-to-grid capacity and demand response capacity in a 33 bus distribution network.