847 resultados para classification methods


Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Résumé : Face à l’accroissement de la résolution spatiale des capteurs optiques satellitaires, de nouvelles stratégies doivent être développées pour classifier les images de télédétection. En effet, l’abondance de détails dans ces images diminue fortement l’efficacité des classifications spectrales; de nombreuses méthodes de classification texturale, notamment les approches statistiques, ne sont plus adaptées. À l’inverse, les approches structurelles offrent une ouverture intéressante : ces approches orientées objet consistent à étudier la structure de l’image pour en interpréter le sens. Un algorithme de ce type est proposé dans la première partie de cette thèse. Reposant sur la détection et l’analyse de points-clés (KPC : KeyPoint-based Classification), il offre une solution efficace au problème de la classification d’images à très haute résolution spatiale. Les classifications effectuées sur les données montrent en particulier sa capacité à différencier des textures visuellement similaires. Par ailleurs, il a été montré dans la littérature que la fusion évidentielle, reposant sur la théorie de Dempster-Shafer, est tout à fait adaptée aux images de télédétection en raison de son aptitude à intégrer des concepts tels que l’ambiguïté et l’incertitude. Peu d’études ont en revanche été menées sur l’application de cette théorie à des données texturales complexes telles que celles issues de classifications structurelles. La seconde partie de cette thèse vise à combler ce manque, en s’intéressant à la fusion de classifications KPC multi-échelle par la théorie de Dempster-Shafer. Les tests menés montrent que cette approche multi-échelle permet d’améliorer la classification finale dans le cas où l’image initiale est de faible qualité. De plus, l’étude effectuée met en évidence le potentiel d’amélioration apporté par l’estimation de la fiabilité des classifications intermédiaires, et fournit des pistes pour mener ces estimations.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this work we focus on pattern recognition methods related to EMG upper-limb prosthetic control. After giving a detailed review of the most widely used classification methods, we propose a new classification approach. It comes as a result of comparison in the Fourier analysis between able-bodied and trans-radial amputee subjects. We thus suggest a different classification method which considers each surface electrodes contribute separately, together with five time domain features, obtaining an average classification accuracy equals to 75% on a sample of trans-radial amputees. We propose an automatic feature selection procedure as a minimization problem in order to improve the method and its robustness.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

International comparison is complicated by the use of different terms, classification methods, policy frameworks and system structures, not to mention different languages and terminology. Multi-case studies can assist in the understanding of the influence wielded by cultural, social, economic, historical and political forces upon educational decisions, policy construction and changes over time. But case studies alone are not enough. In this paper, we argue for an ecological or scaled approach that travels through macro, meso and micro levels to build nested case-studies to allow for more comprehensive analysis of the external and internal factors that shape policy-making and education systems. Such an approach allows for deeper understanding of the relationship between globalizing trends and policy developments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper evaluates the suitability of sequence classification techniques for analyzing deviant business process executions based on event logs. Deviant process executions are those that deviate in a negative or positive way with respect to normative or desirable outcomes, such as non-compliant executions or executions that undershoot or exceed performance targets. We evaluate a range of feature types and classification methods in terms of their ability to accurately discriminate between normal and deviant executions both when deviances are infrequent (unbalanced) and when deviances are as frequent as normal executions (balanced). We also analyze the ability of the discovered rules to explain potential causes and contributing factors of observed deviances. The evaluation results show that feature types extracted using pattern mining techniques only slightly outperform those based on individual activity frequency. The results also suggest that more complex feature types ought to be explored to achieve higher levels of accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel near-infrared spectroscopy (NIRS) method has been researched and developed for the simultaneous analyses of the chemical components and associated properties of mint (Mentha haplocalyx Briq.) tea samples. The common analytes were: total polysaccharide content, total flavonoid content, total phenolic content, and total antioxidant activity. To resolve the NIRS data matrix for such analyses, least squares support vector machines was found to be the best chemometrics method for prediction, although it was closely followed by the radial basis function/partial least squares model. Interestingly, the commonly used partial least squares was unsatisfactory in this case. Additionally, principal component analysis and hierarchical cluster analysis were able to distinguish the mint samples according to their four geographical provinces of origin, and this was further facilitated with the use of the chemometrics classification methods-K-nearest neighbors, linear discriminant analysis, and partial least squares discriminant analysis. In general, given the potential savings with sampling and analysis time as well as with the costs of special analytical reagents required for the standard individual methods, NIRS offered a very attractive alternative for the simultaneous analysis of mint samples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The most difficult operation in the flood inundation mapping using optical flood images is to separate fully inundated areas from the ‘wet’ areas where trees and houses are partly covered by water. This can be referred as a typical problem the presence of mixed pixels in the images. A number of automatic information extraction image classification algorithms have been developed over the years for flood mapping using optical remote sensing images. Most classification algorithms generally, help in selecting a pixel in a particular class label with the greatest likelihood. However, these hard classification methods often fail to generate a reliable flood inundation mapping because the presence of mixed pixels in the images. To solve the mixed pixel problem advanced image processing techniques are adopted and Linear Spectral unmixing method is one of the most popular soft classification technique used for mixed pixel analysis. The good performance of linear spectral unmixing depends on two important issues, those are, the method of selecting endmembers and the method to model the endmembers for unmixing. This paper presents an improvement in the adaptive selection of endmember subset for each pixel in spectral unmixing method for reliable flood mapping. Using a fixed set of endmembers for spectral unmixing all pixels in an entire image might cause over estimation of the endmember spectra residing in a mixed pixel and hence cause reducing the performance level of spectral unmixing. Compared to this, application of estimated adaptive subset of endmembers for each pixel can decrease the residual error in unmixing results and provide a reliable output. In this current paper, it has also been proved that this proposed method can improve the accuracy of conventional linear unmixing methods and also easy to apply. Three different linear spectral unmixing methods were applied to test the improvement in unmixing results. Experiments were conducted in three different sets of Landsat-5 TM images of three different flood events in Australia to examine the method on different flooding conditions and achieved satisfactory outcomes in flood mapping.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The loss and degradation of forest cover is currently a globally recognised problem. The fragmentation of forests is further affecting the biodiversity and well-being of the ecosystems also in Kenya. This study focuses on two indigenous tropical montane forests in the Taita Hills in southeastern Kenya. The study is a part of the TAITA-project within the Department of Geography in the University of Helsinki. The study forests, Ngangao and Chawia, are studied by remote sensing and GIS methods. The main data includes black and white aerial photography from 1955 and true colour digital camera data from 2004. This data is used to produce aerial mosaics from the study areas. The land cover of these study areas is studied by visual interpretation, pixel-based supervised classification and object-oriented supervised classification. The change of the forest cover is studied with GIS methods using the visual interpretations from 1955 and 2004. Furthermore, the present state of the study forests is assessed with leaf area index and canopy closure parameters retrieved from hemispherical photographs as well as with additional, previously collected forest health monitoring data. The canopy parameters are also compared with textural parameters from digital aerial mosaics. This study concludes that the classification of forest areas by using true colour data is not an easy task although the digital aerial mosaics are proved to be very accurate. The best classifications are still achieved with visual interpretation methods as the accuracies of the pixel-based and object-oriented supervised classification methods are not satisfying. According to the change detection of the land cover in the study areas, the area of indigenous woodland in both forests has decreased in 1955 2004. However in Ngangao, the overall woodland area has grown mainly because of plantations of exotic species. In general, the land cover of both study areas is more fragmented in 2004 than in 1955. Although the forest area has decreased, forests seem to have a more optimistic future than before. This is due to the increasing appreciation of the forest areas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este trabalho de pesquisa descreve dois estudos de caso de métodos quimiométricos empregados para a quantificação de hidrocarbonetos policíclicos aromáticos HPAs (naftaleno, fluoreno, fenantreno e fluoranteno) em água potável usando espectroscopia de fluorescência molecular e a classificação e caracterização de sucos de uva e seus parâmetros de qualidade através de espectroscopia de infravermelho próximo. O objetivo do primeiro estudo é a aplicação combinada de métodos quimiométricos de segunda ordem (N-PLS, U-PLS, U-PLS/RBL e PARAFAC) e espectrofluorimetria para determinação direta de HPAs em água potável, visando contribuir para o conhecimento do potencial destas metodologias como alternativa viável para a determinação tradicional por cromatografia univariada. O segundo estudo de caso destinado à classificação e determinação de parâmetros de qualidade de sucos de uva, densidade relativa e teor de sólidos solúveis totais, foi medida por espectroscopia de infravermelho próximo e métodos quimiométricos. Diversos métodos quimiométricos, tais como HCA, PLS-DA, SVM-DA e SIMCA foram investigados para a classificação amostras de sucos de uva ao mesmo tempo que métodos de calibração multivariada de primeira ordem, tais como PLS, iPLS e SVM-LS foram usadas para a predição dos parâmetros de qualidade. O princípio orientador para o desenvolvimento dos estudos aqui descritos foi a necessidade de metodologias analíticas com custo, tempo de execução e facilidade de operação melhores e menor produção de resíduos do que os métodos atualmente utilizados para a quantificação de HPAs, em água de torneira, e classificação e caracterização das amostras de suco de uva e seus parâmetros de qualidade

Relevância:

60.00% 60.00%

Publicador:

Resumo:

NOAA’s National Centers for Coastal Ocean Science Biogeography Branch has mapped and characterized large portions of the coral reef ecosystems inside the U.S. coastal and territorial waters, including the U.S. Caribbean. The complementary protocols used in these efforts have enabled scientists and managers to quantitatively and qualitatively compare marine ecosystems in tropical U.S. waters. The Biogeography Branch used similar protocols to generate new benthic habitat maps for Fish Bay, Coral Bay and the St. Thomas East End Reserve (STEER). While this mapping effort marks the third time that some of these shallow-water habitats (≤40 m) have been mapped, it is the first time that nearly 100% of the seafloor has been characterized in each of these areas. It is also the first time that high resolution imagery describing seafloor depth has been collected in each of these areas. Consequently, these datasets provide new information describing the distribution of coral reef ecosystems and serve as a spatial baseline for monitoring change in the Fish Bay, Coral Bay and the STEER. Benthic habitat maps were developed for approximately 64.3 square kilometers of seafloor in and around Fish Bay, Coral Bay and the STEER. Twenty seven percent (17.5 square kilometers) of these habitat maps describe the seafloor inside the boundaries of the STEER, the Virgin Islands National Park and the Virgin Islands Coral Reef National Monument. The remaining 73% (46.8 square kilometers) describe the seafloor outside of these MPA boundaries. These habitat maps were developed using a combination of semi-automated and manual classification methods. Habitats were interpreted from aerial photographs and LiDAR (Light Detection and Ranging) imagery. In total, 155 distinct combinations of habitat classes describing the geology and biology of the seafloor were identified from the source imagery.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

NOAA’s Center for Coastal Monitoring and Assessment’s Biogeography Branch has mapped and characterized large portions of the coral reef ecosystems inside the U.S. coastal and territorial waters, including the U.S. Caribbean. The complementary protocols used in these efforts have enabled scientists and managers to quantitatively compare different marine ecosystems in tropical U.S. waters. The Biogeography Branch used these same general protocols to generate three seamless habitat maps of the Bank/Shelf (i.e., from 0 ≤50 meters) and the Bank/Shelf Escarpment (i.e., from 50 ≤1,000 meters and from 1,000 ≤ 1,830 meters) inside Buck Island Reef National Monument (BIRNM). While this mapping effort marks the fourth time that the shallow-water habitats of BIRNM have been mapped, it is the first time habitats deeper than 30 meters (m) have been characterized. Consequently, this habitat map provides information on the distribution of mesophotic and deep-water coral reef ecosystems and serves as a spatial baseline for monitoring change in the Monument. A benthic habitat map was developed for approximately 74.3 square kilometers or 98% of the BIRNM using a combination of semi-automated and manual classification methods. The remaining 2% was not mapped due to lack of imagery in the western part of the Monument at depths ranging from 1,000 to 1,400 meters. Habitats were interpreted from orthophotographs, LiDAR (Light Detection and Ranging) imagery and four different types of MBES (Multibeam Echosounder) imagery. Three minimum mapping units (MMUs) (100, 1,000 and 5,000 square meters) were used because of the wide range of depths present in the Monument. The majority of the area that was characterized was deeper than 30 m on the Bank/Shelf Escarpment. This escarpment area was dominated by uncolonized sand which transitioned to mud as depth increased. Bedrock was exposed in some areas of the escarpment, where steep slopes prevented sediment deposition. Mesophotic corals were seen in the underwater video, but were too sparsely distributed to be reliably mapped from the source imagery. Habitats on the Bank/Shelf were much more variable than those seen on the Bank/Shelf Escarpment. The majority of this shelf area was comprised of coral reef and hardbottom habitat dominated by various forms of turf, fleshy, coralline or filamentous algae. Even though algae was the dominant biological cover type, nearly a quarter (24.3%) of the Monument’s Bank/Shelf benthos hosted a cover of 10%-<50% live coral. In total, 198 unique combinations of habitat classes describing the geography, geology and biology of the sea-floor were identified from the three types of imagery listed above. No thematic accuracy assessment was conducted for areas deeper than about 50 meters, most of which was located in the Bank/Shelf Escarpment. The thematic accuracy of classes in waters shallower than approximately 50 meters ranged from 81.4% to 94.4%. These thematic accuracies are similar to those reported for other NOAA benthic habitat mapping efforts in St. John (>80%), the Main Eight Hawaiian Islands (>84.0%) and the Republic of Palau (>80.0%). These digital maps products can be used with confidence by scientists and resource managers for a multitude of different applications, including structuring monitoring programs, supporting management decisions, and establishing and managing marine conservation areas. The final deliverables for this project, including the benthic habitat maps, source imagery and in situ field data, are available to the public on a NOAA Biogeography Branch website (http://ccma.nos.noaa.gov/ecosystems/coralreef/stcroix.aspx) and through an interactive, web-based map application (http://ccma.nos.noaa.gov/explorer/biomapper/biomapper.html?id=BUIS). This report documents the process and methods used to create the shallow to deep-water benthic habitat maps for BIRNM. Chapter 1 provides a short introduction to BIRNM, including its history, marine life and ongoing research activities. Chapter 2 describes the benthic habitat classification scheme used to partition the different habitats into ecologically relevant groups. Chapter 3 explains the steps required to create a benthic habitat map using a combination of semi-automated and visual classification techniques. Chapter 4 details the steps used in the accuracy assessment and reports on the thematic accuracy of the final shallow-water map. Chapter 5 summarizes the type and abundance of each habitat class found inside BIRNM, how these habitats compare to past habitat maps and outlines how these new habitat maps may be used to inform future management activities.