862 resultados para malware classification
Resumo:
We present an overview of the QUT plant classification system submitted to LifeCLEF 2014. This system uses generic features extracted from a convolutional neural network previously used to perform general object classification. We examine the effectiveness of these features to perform plant classification when used in combination with an extremely randomised forest. Using this system, with minimal tuning, we obtained relatively good results with a score of 0:249 on the test set of LifeCLEF 2014.
Resumo:
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
Resumo:
This thesis presents a promising boundary setting method for solving challenging issues in text classification to produce an effective text classifier. A classifier must identify boundary between classes optimally. However, after the features are selected, the boundary is still unclear with regard to mixed positive and negative documents. A classifier combination method to boost effectiveness of the classification model is also presented. The experiments carried out in the study demonstrate that the proposed classifier is promising.
Resumo:
The monitoring of the actual activities of daily living of individuals with lower limb amputation is essential for an evidence-based fitting of the prosthesis, more particularly the choice of components (e.g., knees, ankles, feet)[1-4]. The purpose of this presentation was to give an overview of the categorization of the load regime data to assess the functional output and usage of the prosthesis of lower limb amputees has presented in several publications[5, 6]. The objectives were to present a categorization of load regime and to report the results for a case.
Resumo:
Background There is a need for better understanding of the dispersion of classification-related variable to develop an evidence-based classification of athletes with a disability participating in stationary throwing events. Objectives The purposes of this study are (A) to describe tools designed to comprehend and represent the dispersion of the performance between successive classes, and (B) to present this dispersion for the elite male and female stationary shot-putters who participated in Beijing 2008 Paralympic Games. Study design Retrospective study Methods This study analysed a total of 479 attempts performed by 114 male and female stationary shot-putters in three F30s (F32-F34) and six F50s (F52-F58) classes during the course of eight events during Beijing 2008 Paralympic Games. Results The average differences of best performance were 1.46±0.46 m for males between F54 and F58 classes as well as 1.06±1.18 m for females between F55 and F58 classes. The results demonstrated a linear relationship between best performance and classification while revealing two male Gold Medallists in F33 and F52 classes were outliers. Conclusions This study confirms the benefits of the comparative matrices, performance continuum and dispersion plots to comprehend classification-related variables. The work presented here represents a stepping stone into biomechanical analyses of stationary throwers, particularly on the eve of the London 2012 Paralympic Games where new evidences could be gathered.
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
Semantic perception and object labeling are key requirements for robots interacting with objects on a higher level. Symbolic annotation of objects allows the usage of planning algorithms for object interaction, for instance in a typical fetchand-carry scenario. In current research, perception is usually based on 3D scene reconstruction and geometric model matching, where trained features are matched with a 3D sample point cloud. In this work we propose a semantic perception method which is based on spatio-semantic features. These features are defined in a natural, symbolic way, such as geometry and spatial relation. In contrast to point-based model matching methods, a spatial ontology is used where objects are rather described how they "look like", similar to how a human would described unknown objects to another person. A fuzzy based reasoning approach matches perceivable features with a spatial ontology of the objects. The approach provides a method which is able to deal with senor noise and occlusions. Another advantage is that no training phase is needed in order to learn object features. The use-case of the proposed method is the detection of soil sample containers in an outdoor environment which have to be collected by a mobile robot. The approach is verified using real world experiments.
Resumo:
Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.
Resumo:
The paper presents data on petrology, bulk rock and mineral compositions, and textural classification of the Middle Jurassic Jericho kimberlite (Slave craton, Canada). The kimberlite was emplaced as three steep-sided pipes in granite that was overlain by limestones and minor soft sediments. The pipes are infilled with hypabyssal and pyroclastic kimberlites and connected to a satellite pipe by a dyke. The Jericho kimberlite is classified as a Group Ia, lacking groundmass tetraferriphlogopite and containing monticellite pseudomorphs. The kimberlite formed, during several consecutive emplacement events of compositionally different batches of kimberlite magma. Core-logging and thin-section observations identified at least two phases of hypabyssal kimberlites and three phases of pyroclastic kimberlites. Hypabyssal kimberlites intruded as a main dyke (HK1) and as late small-volume aphanitic and vesicular dykes. Massive pyroclastic kimberlite (MPK1) predominantly filled the northern and southern lobes of the pipe and formed from magma different from the HK1 magma. The MPK1 magma crystallized Ti-, Fe-, and Cr-rich phlogopite without rims of barian phlogopite, and clinopyroxene and spinel without atoll structures. MPK1 textures, superficially reminiscent of tuffisitic kimberlite, are caused by pervasive contamination by granite xenoliths. The next explosive events filled the central lobe with two varieties of pyroclastic kimberlite: (1) massive and (2) weakly bedded, normally graded pyroclastic kimberlite. The geology of the Jericho pipe differs from the geology of South African or the Prairie kimberlites, but may resemble Lac de Gras pipes, in which deeper erosion removed upper fades of resedimented kimberlites.
Resumo:
To classify each stage for a progressing disease such as Alzheimer’s disease is a key issue for the disease prevention and treatment. In this study, we derived structural brain networks from diffusion-weighted MRI using whole-brain tractography since there is growing interest in relating connectivity measures to clinical, cognitive, and genetic data. Relatively little work has usedmachine learning to make inferences about variations in brain networks in the progression of the Alzheimer’s disease. Here we developed a framework to utilize generalized low rank approximations of matrices (GLRAM) and modified linear discrimination analysis for unsupervised feature learning and classification of connectivity matrices. We apply the methods to brain networks derived from DWI scans of 41 people with Alzheimer’s disease, 73 people with EMCI, 38 people with LMCI, 47 elderly healthy controls and 221 young healthy controls. Our results show that this new framework can significantly improve classification accuracy when combining multiple datasets; this suggests the value of using data beyond the classification task at hand to model variations in brain connectivity.
Resumo:
Human expert analyses are commonly used in bioacoustic studies and can potentially limit the reproducibility of these results. In this paper, a machine learning method is presented to statistically classify avian vocalizations. Automated approaches were applied to isolate bird songs from long field recordings, assess song similarities, and classify songs into distinct variants. Because no positive controls were available to assess the true classification of variants, multiple replicates of automatic classification of song variants were analyzed to investigate clustering uncertainty. The automatic classifications were more similar to the expert classifications than expected by chance. Application of these methods demonstrated the presence of discrete song variants in an island population of the New Zealand hihi (Notiomystis cincta). The geographic patterns of song variation were then revealed by integrating over classification replicates. Because this automated approach considers variation in song variant classification, it reduces potential human bias and facilitates the reproducibility of the results.
Resumo:
Social media platforms, that foster user generated content, have altered the ways consumers search for product related information. Conducting online searches, reading product reviews, and comparing products ratings, is becoming a more common information seeking pathway. This research demonstrates that info-active consumers are becoming less reliant on information provided by retailers or manufacturers, hence marketing generated online content may have a reduced impact on their purchasing behaviour. The results of this study indicate that beyond traditional methods of segmenting consumers, in the online context, new classifications such as info-active and info-passive would be beneficial in digital marketing. This cross-sectional, mixed-methods study is based on 43 in-depth interviews and an online survey with 500 consumers from 30 countries.
Resumo:
A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations
Resumo:
A novel combined near- and mid-infrared (NIR and MIR) spectroscopic method has been researched and developed for the analysis of complex substances such as the Traditional Chinese Medicine (TCM), Illicium verum Hook. F. (IVHF), and its noxious adulterant, Iuicium lanceolatum A.C. Smith (ILACS). Three types of spectral matrix were submitted for classification with the use of the linear discriminant analysis (LDA) method. The data were pretreated with either the successive projections algorithm (SPA) or the discrete wavelet transform (DWT) method. The SPA method performed somewhat better, principally because it required less spectral features for its pretreatment model. Thus, NIR or MIR matrix as well as the combined NIR/MIR one, were pretreated by the SPA method, and then analysed by LDA. This approach enabled the prediction and classification of the IVHF, ILACS and mixed samples. The MIR spectral data produced somewhat better classification rates than the NIR data. However, the best results were obtained from the combined NIR/MIR data matrix with 95–100% correct classifications for calibration, validation and prediction. Principal component analysis (PCA) of the three types of spectral data supported the results obtained with the LDA classification method.
Resumo:
Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of "cognitive presence" – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.