862 resultados para malware classification
Resumo:
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.
Resumo:
In this work the G(A)(0) distribution is assumed as the universal model for amplitude Synthetic Aperture (SAR) imagery data under the Multiplicative Model. The observed data, therefore, is assumed to obey a G(A)(0) (alpha; gamma, n) law, where the parameter n is related to the speckle noise, and (alpha, gamma) are related to the ground truth, giving information about the background. Therefore, maps generated by the estimation of (alpha, gamma) in each coordinate can be used as the input for classification methods. Maximum likelihood estimators are derived and used to form estimated parameter maps. This estimation can be hampered by the presence of corner reflectors, man-made objects used to calibrate SAR images that produce large return values. In order to alleviate this contamination, robust (M) estimators are also derived for the universal model. Gaussian Maximum Likelihood classification is used to obtain maps using hard-to-deal-with simulated data, and the superiority of robust estimation is quantitatively assessed.
Resumo:
In this paper, an improved stochastic discrimination (SD) is introduced to reduce the error rate of the standard SD in the context of multi-class classification problem. The learning procedure of the improved SD consists of two stages. In the first stage, a standard SD, but with shorter learning period is carried out to identify an important space where all the misclassified samples are located. In the second stage, the standard SD is modified by (i) restricting sampling in the important space; and (ii) introducing a new discriminant function for samples in the important space. It is shown by mathematical derivation that the new discriminant function has the same mean, but smaller variance than that of standard SD for samples in the important space. It is also analyzed that the smaller the variance of the discriminant function, the lower the error rate of the classifier. Consequently, the proposed improved SD improves standard SD by its capability of achieving higher classification accuracy. Illustrative examples axe provided to demonstrate the effectiveness of the proposed improved SD.
Classification of lactose and mandelic acid THz spectra using subspace and wavelet-packet algorithms
Resumo:
This work compares classification results of lactose, mandelic acid and dl-mandelic acid, obtained on the basis of their respective THz transients. The performance of three different pre-processing algorithms applied to the time-domain signatures obtained using a THz-transient spectrometer are contrasted by evaluating the classifier performance. A range of amplitudes of zero-mean white Gaussian noise are used to artificially degrade the signal-to-noise ratio of the time-domain signatures to generate the data sets that are presented to the classifier for both learning and validation purposes. This gradual degradation of interferograms by increasing the noise level is equivalent to performing measurements assuming a reduced integration time. Three signal processing algorithms were adopted for the evaluation of the complex insertion loss function of the samples under study; a) standard evaluation by ratioing the sample with the background spectra, b) a subspace identification algorithm and c) a novel wavelet-packet identification procedure. Within class and between class dispersion metrics are adopted for the three data sets. A discrimination metric evaluates how well the three classes can be distinguished within the frequency range 0. 1 - 1.0 THz using the above algorithms.
Resumo:
This work analyzes the use of linear discriminant models, multi-layer perceptron neural networks and wavelet networks for corporate financial distress prediction. Although simple and easy to interpret, linear models require statistical assumptions that may be unrealistic. Neural networks are able to discriminate patterns that are not linearly separable, but the large number of parameters involved in a neural model often causes generalization problems. Wavelet networks are classification models that implement nonlinear discriminant surfaces as the superposition of dilated and translated versions of a single "mother wavelet" function. In this paper, an algorithm is proposed to select dilation and translation parameters that yield a wavelet network classifier with good parsimony characteristics. The models are compared in a case study involving failed and continuing British firms in the period 1997-2000. Problems associated with over-parameterized neural networks are illustrated and the Optimal Brain Damage pruning technique is employed to obtain a parsimonious neural model. The results, supported by a re-sampling study, show that both neural and wavelet networks may be a valid alternative to classical linear discriminant models.