865 resultados para Associative Classifiers
Resumo:
When combining remote sensing imagery with statistical classifiers to obtain categorical thematic maps it is not usual to provide data about the spatial distribution of the error and uncertainty of the resulting maps. This paper describes, in the context of GeoViQua FP7 project, feasible approaches for methods based on several steps such as hybrid classifiers. Both for “per pixel” and “per polygon” strategies, the proposal is based on the use of the available ground truth, which is used to properly model the spatial distribution of the errors. Results allow mapping the classification success with a very high level of reliability (R2>0,94), providing users a sound knowledge of the accuracy at every area of the map.
Resumo:
Identification of humans via ECG is being increasingly studied because it can have several advantages over the traditional biometric identification techniques. However, difficulties arise because of the heartrate variability. In this study we analysed the influence of QT interval correction on the performance of an identification system based on temporal and amplitude features of ECG. In particular we tested MLP, Naive Bayes and 3-NN classifiers on the Fantasia database. Results indicate that QT correction can significantly improve the overall system performance. © 2013 IEEE.
Resumo:
2000 Mathematics Subject Classification: 16N80, 16S70, 16D25, 13G05.
Resumo:
User queries over image collections, based on semantic similarity, can be processed in several ways. In this paper, we propose to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014
Resumo:
It is well established that accent recognition can be as accurate as up to 95% when the signals are noise-free, using feature extraction techniques such as mel-frequency cepstral coefficients and binary classifiers such as discriminant analysis, support vector machine and k-nearest neighbors. In this paper, we demonstrate that the predictive performance can be reduced by as much as 15% when the signals are noisy. Specifically, in this paper we perturb the signals with different levels of white noise, and as the noise become stronger, the out-of-sample predictive performance deteriorates from 95% to 80%, although the in-sample prediction gives overly-optimistic results. ACM Computing Classification System (1998): C.3, C.5.1, H.1.2, H.2.4., G.3.
Resumo:
Spamming has been a widespread problem for social networks. In recent years there is an increasing interest in the analysis of anti-spamming for microblogs, such as Twitter. In this paper we present a systematic research on the analysis of spamming in Sina Weibo platform, which is currently a dominant microblogging service provider in China. Our research objectives are to understand the specific spamming behaviors in Sina Weibo and find approaches to identify and block spammers in Sina Weibo based on spamming behavior classifiers. To start with the analysis of spamming behaviors we devise several effective methods to collect a large set of spammer samples, including uses of proactive honeypots and crawlers, keywords based searching and buying spammer samples directly from online merchants. We processed the database associated with these spammer samples and interestingly we found three representative spamming behaviors: Aggressive advertising, repeated duplicate reposting and aggressive following. We extract various features and compare the behaviors of spammers and legitimate users with regard to these features. It is found that spamming behaviors and normal behaviors have distinct characteristics. Based on these findings we design an automatic online spammer identification system. Through tests with real data it is demonstrated that the system can effectively detect the spamming behaviors and identify spammers in Sina Weibo.
Resumo:
One major drawback of coherent optical orthogonal frequency-division multiplexing (CO-OFDM) that hitherto remains unsolved is its vulnerability to nonlinear fiber effects due to its high peak-to-average power ratio. Several digital signal processing techniques have been investigated for the compensation of fiber nonlinearities, e.g., digital back-propagation, nonlinear pre- and post-compensation and nonlinear equalizers (NLEs) based on the inverse Volterra-series transfer function (IVSTF). Alternatively, nonlinearities can be mitigated using nonlinear decision classifiers such as artificial neural networks (ANNs) based on a multilayer perceptron. In this paper, ANN-NLE is presented for a 16QAM CO-OFDM system. The capability of the proposed approach to compensate the fiber nonlinearities is numerically demonstrated for up to 100-Gb/s and over 1000km and compared to the benchmark IVSTF-NLE. Results show that in terms of Q-factor, for 100-Gb/s at 1000km of transmission, ANN-NLE outperforms linear equalization and IVSTF-NLE by 3.2dB and 1dB, respectively.
Resumo:
Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.
Resumo:
This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan.
Resumo:
Increasing use of the term, Strategic Human Resource Management (SHRM), reflects the recognition of the interdependencies between corporate strategy, organization and human resource management in the functioning of the firm. Dyer and Holder (1988) proposed a comprehensive Human Resource Strategic Typology consisting of three strategic types--inducement, investment and involvement. This research attempted to empirically validate their typology and also test the performance implications of the match between corporate strategy and HR strategy. Hypotheses were tested to determine the relationships between internal consistency in HRM sub-systems, match between corporate strategy and HR strategy, and firm performance. Data were collected by a mail survey of 998 senior HR executives of whom 263 returned the completed questionnaire. Financial information on 909 firms was collected from secondary sources like 10-K reports and CD-Disclosure. Profitability ratios were indexed to industry averages. Confirmatory Factor Analysis using LISREL provided support in favor of the six-factor HR measurement model; the six factors were staffing, training, compensation, appraisal, job design and corporate involvement. Support was also found for the presence of a second-order factor labeled "HR Strategic Orientation" explaining the variations among the six factors. LISREL analysis also supported the congruence hypothesis that HR Strategic Orientation significantly affects firm performance. There was a significant associative relationship between HR Strategy and Corporate Strategy. However, the contingency effects of the match between HR and Corporate strategies were not supported. Several tests were conducted to show that the survey results are not affected by non-response bias nor by mono-method bias. Implications of these findings for both researchers and practitioners are discussed. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The South American electric knifefish, Brachyhypopomus gauderio, uses weakly electric fields to see and communicate in the dark. Only one study to date has investigated natural behavior in this species during the breeding season; this study proposed that B. guarerio has an exploded lek polygyny breeding system. To test this hypothesis, artificial marshes simulating the native vegetation, temperature, and water conductivities of the South American subtropics were created to study seasonal variation in associative behavior of B. gauderio during the breeding and non-breeding seasons. Mark/recapture methods were used to keep track of individual fish and their dispersion inside the experimental designs. The experimental design proved to be extremely successful at eliciting reproduction. Differences were found in seasonal variations of social behaviors between adult and juvenile populations. Although no apparent sex. differences in movement patterns were found during the breeding season; a trend for male-male aversion was found, suggesting male-male avoidance as a possible strategy guiding aspects of social behaviors in this species. Further, movement may be a tactic for mate seeking as the individuals who moved the most during the breeding season obtained the most opposite sex interactions. These findings support the exploded lek polygyny model. Social interactions are subject to complex regulation by social, physiologic and ecological factors; the extent to which these associations are repeatable may provide novel insights on the evolution of sociality as it has been shaped by natural selection.