849 resultados para Classification approach
Resumo:
N-gram analysis is an approach that investigates the structure of a program using bytes, characters or text strings. This research uses dynamic analysis to investigate malware detection using a classification approach based on N-gram analysis. A key issue with dynamic analysis is the length of time a program has to be run to ensure a correct classification. The motivation for this research is to find the optimum subset of operational codes (opcodes) that make the best indicators of malware and to determine how long a program has to be monitored to ensure an accurate support vector machine (SVM) classification of benign and malicious software. The experiments within this study represent programs as opcode density histograms gained through dynamic analysis for different program run periods. A SVM is used as the program classifier to determine the ability of different program run lengths to correctly determine the presence of malicious software. The findings show that malware can be detected with different program run lengths using a small number of opcodes
Resumo:
N-gram analysis is an approach that investigates the structure of a program using bytes, characters or text strings. This research uses dynamic analysis to investigate malware detection using a classification approach based on N-gram analysis. The motivation for this research is to find a subset of Ngram features that makes a robust indicator of malware. The experiments within this paper represent programs as N-gram density histograms, gained through dynamic analysis. A Support Vector Machine (SVM) is used as the program classifier to determine the ability of N-grams to correctly determine the presence of malicious software. The preliminary findings show that an N-gram size N=3 and N=4 present the best avenues for further analysis.
Resumo:
Les données sur l'utilisation des médicaments sont généralement recueillies dans la recherche clinique. Pourtant, aucune méthode normalisée pour les catégoriser n’existe, que ce soit pour la description des échantillons ou pour l'étude de l'utilisation des médicaments comme une variable. Cette étude a été conçue pour développer un système de classification simple, sur une base empirique, pour la catégorisation d'utilisation des médicaments. Nous avons utilisé l'analyse factorielle pour réduire le nombre de groupements de médicaments possible. Cette analyse a fait émerger un modèle de constellations de consommation de médicaments qui semble caractériser des groupes cliniques spécifiques. Pour illustrer le potentiel de la technique, nous avons appliqué ce système de classification des échantillons où les troubles du sommeil sont importants: syndrome de fatigue chronique et l'apnée du sommeil. Notre méthode de classification a généré 5 facteurs qui semblent adhérer de façon logique. Ils ont été nommés: Médicaments cardiovasculaire/syndrome métabolique, Médicaments pour le soulagement des symptômes, Médicaments psychotropes, Médicaments préventifs et Médicaments hormonaux. Nos résultats démontrent que le profil des médicaments varie selon l'échantillon clinique. Le profil de médicament associé aux participants apnéiques reflète les conditions de comorbidité connues parmi ce groupe clinique, et le profil de médicament associé au Syndrome de fatigue chronique semble refléter la perception commune de cette condition comme étant un trouble psychogène
Resumo:
The Brain-Computer Interfaces (BCI) have as main purpose to establish a communication path with the central nervous system (CNS) independently from the standard pathway (nervous, muscles), aiming to control a device. The main objective of the current research is to develop an off-line BCI that separates the different EEG patterns resulting from strictly mental tasks performed by an experimental subject, comparing the effectiveness of different signal-preprocessing approaches. We also tested different classification approaches: all versus all, one versus one and a hierarchic classification approach. No preprocessing techniques were found able to improve the system performance. Furthermore, the hierarchic approach proved to be capable to produce results above the expected by literature
Resumo:
Land cover mappings represent important tools for the regional planning. However, the current mappings are related to very specific purposes and, consequently, they are limited in their capacity to define the wide variety of existing types of land cover. In that context, this paper aims at developing a wide and including hierarchical classification system for land cover mapping in regional scale, which should contribute for a future standardization of classes. Besides, it is intended to test that system for a study case that contemplates the use of a classification method based on fuzzy approach, which has shown to be more appropriate than conventional approaches. Therefore, it was proposed a hierarchical classification system with three detailing levels and a study case was defined with the specification of the test area and of the classification project. Then, the georreferencing of a TM/Landsat-5 image that comprises the test area was carried out. Later, it was applied a fuzzy classification approach in the TM/Landsat-5 image, starting from images of probability for the mapped classes and an uncertainty image were generated. Finally, it was produced a conventional output that represents the thematic mapping of the test area.
Resumo:
A post classification change detection technique based on a hybrid classification approach (unsupervised and supervised) was applied to Landsat Thematic Mapper (TM), Landsat Enhanced Thematic Plus (ETM+), and ASTER images acquired in 1987, 2000 and 2004 respectively to map land use/cover changes in the Pic Macaya National Park in the southern region of Haiti. Each image was classified individually into six land use/cover classes: built-up, agriculture, herbaceous, open pine forest, mixed forest, and barren land using unsupervised ISODATA and maximum likelihood supervised classifiers with the aid of field collected ground truth data collected in the field. Ground truth information, collected in the field in December 2007, and including equalized stratified random points which were visual interpreted were used to assess the accuracy of the classification results. The overall accuracy of the land classification for each image was respectively: 1987 (82%), 2000 (82%), 2004 (87%). A post classification change detection technique was used to produce change images for 1987 to 2000, 1987 to 2004, and 2000 to 2004. It was found that significant changes in the land use/cover occurred over the 17- year period. The results showed increases in built up (from 10% to 17%) and herbaceous (from 5% to 14%) areas between 1987 and 2004. The increase of herbaceous was mostly caused by the abandonment of exhausted agriculture lands. At the same time, open pine forest and mixed forest areas lost (75%) and (83%) of their area to other land use/cover types. Open pine forest (from 20% to 14%) and mixed forest (from18 to 12%) were transformed into agriculture area or barren land. This study illustrated the continuing deforestation, land degradation and soil erosion in the region, which in turn is leading to decrease in vegetative cover. The study also showed the importance of Remote Sensing (RS) and Geographic Information System (GIS) technologies to estimate timely changes in the land use/cover, and to evaluate their causes in order to design an ecological based management plan for the park.
Resumo:
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.
Resumo:
Optimal adjustment of brain networks allows the biased processing of information in response to the demand of environments and is therefore prerequisite for adaptive behaviour. It is widely shown that a biased state of networks is associated with a particular cognitive process. However, those associations were identified by backward categorization of trials and cannot provide a causal association with cognitive processes. This problem still remains a big obstacle to advance the state of our field in particular human cognitive neuroscience. In my talk, I will present two approaches to address the causal relationships between brain network interactions and behaviour. Firstly, we combined connectivity analysis of fMRI data and a machine leaning method to predict inter-individual differences of behaviour and responsiveness to environmental demands. The connectivity-based classification approach outperforms local activation-based classification analysis, suggesting that interactions in brain networks carry information of instantaneous cognitive processes. Secondly, we have recently established a brand new method combining transcranial alternating current stimulation (tACS), transcranial magnetic stimulation (TMS), and EEG. We use the method to measure signal transmission between brain areas while introducing extrinsic oscillatory brain activity and to study causal association between oscillatory activity and behaviour. We show that phase-matched oscillatory activity creates the phase-dependent modulation of signal transmission between brain areas, while phase-shifted oscillatory activity blunts the phase-dependent modulation. The results suggest that phase coherence between brain areas plays a cardinal role in signal transmission in the brain networks. In sum, I argue that causal approaches will provide more concreate backbones to cognitive neuroscience.
Resumo:
ZooScan with ZooProcess and Plankton Identifier (PkID) software is an integrated analysis system for acquisition and classification of digital zooplankton images from preserved zooplankton samples. Zooplankton samples are digitized by the ZooScan and processed by ZooProcess and PkID in order to detect, enumerate, measure and classify the digitized objects. Here we present a semi-automatic approach that entails automated classification of images followed by manual validation, which allows rapid and accurate classification of zooplankton and abiotic objects. We demonstrate this approach with a biweekly zooplankton time series from the Bay of Villefranche-sur-mer, France. The classification approach proposed here provides a practical compromise between a fully automatic method with varying degrees of bias and a manual but accurate classification of zooplankton. We also evaluate the appropriate number of images to include in digital learning sets and compare the accuracy of six classification algorithms. We evaluate the accuracy of the ZooScan for automated measurements of body size and present relationships between machine measures of size and C and N content of selected zooplankton taxa. We demonstrate that the ZooScan system can produce useful measures of zooplankton abundance, biomass and size spectra, for a variety of ecological studies.
Resumo:
Background Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955)
Resumo:
Background:Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. Methods: A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables. Results: After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples. Conclusions: We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).
Resumo:
Biotic indices have been developed to summarise information provided by benthic macroinvertebrates, but their use can require specialized taxonomic expertise as well as a time-consuming operation. Using high taxonomic level in biotic indices reduces sampling processing time but should be considered with caution, since assigning tolerance level to high taxonomic levels may cause uncertainty. A methodology for family level tolerance categorization based on the affinity of each family with disturbed or undisturbed conditions was employed. This family tolerance classification approach was tested in two different areas from Mediterranean Sea affected by sewage discharges. Biotic indices employed at family level responded correctly to sewage presence. However, in areas with different communities among stations and high diversity of species within each family, assigning the same tolerance level to a whole family could imply mistakes. Thus, use of high taxonomic level in biotic indices should be only restricted to areas where homogeneous community is presented and families across sites have similar species composition.
Resumo:
The computer simulation of manufacturing systems is commonly carried out using discrete event simulation (DES). Indeed, there appears to be a lack of applications of continuous simulation methods, particularly system dynamics (SD), despite evidence that this technique is suitable for industrial modelling. This paper investigates whether this is due to a decline in the general popularity of SD, or whether modelling of manufacturing systems represents a missed opportunity for SD. On this basis, the paper first gives a review of the concept of SD and fully describes the modelling technique. Following on, a survey of the published applications of SD in the 1990s is made by developing and using a structured classification approach. From this review, observations are made about the application of the SD method and opportunities for future research are suggested.
Resumo:
Common bottlenose dolphins (Tursiops truncatus), produce a wide variety of vocal emissions for communication and echolocation, of which the pulsed repertoire has been the most difficult to categorize. Packets of high repetition, broadband pulses are still largely reported under a general designation of burst-pulses, and traditional attempts to classify these emissions rely mainly in their aural characteristics and in graphical aspects of spectrograms. Here, we present a quantitative analysis of pulsed signals emitted by wild bottlenose dolphins, in the Sado estuary, Portugal (2011-2014), and test the reliability of a traditional classification approach. Acoustic parameters (minimum frequency, maximum frequency, peak frequency, duration, repetition rate and inter-click-interval) were extracted from 930 pulsed signals, previously categorized using a traditional approach. Discriminant function analysis revealed a high reliability of the traditional classification approach (93.5% of pulsed signals were consistently assigned to their aurally based categories). According to the discriminant function analysis (Wilk's Λ = 0.11, F3, 2.41 = 282.75, P < 0.001), repetition rate is the feature that best enables the discrimination of different pulsed signals (structure coefficient = 0.98). Classification using hierarchical cluster analysis led to a similar categorization pattern: two main signal types with distinct magnitudes of repetition rate were clustered into five groups. The pulsed signals, here described, present significant differences in their time-frequency features, especially repetition rate (P < 0.001), inter-click-interval (P < 0.001) and duration (P < 0.001). We document the occurrence of a distinct signal type-short burst-pulses, and highlight the existence of a diverse repertoire of pulsed vocalizations emitted in graded sequences. The use of quantitative analysis of pulsed signals is essential to improve classifications and to better assess the contexts of emission, geographic variation and the functional significance of pulsed signals.
Resumo:
In this work we focus on pattern recognition methods related to EMG upper-limb prosthetic control. After giving a detailed review of the most widely used classification methods, we propose a new classification approach. It comes as a result of comparison in the Fourier analysis between able-bodied and trans-radial amputee subjects. We thus suggest a different classification method which considers each surface electrodes contribute separately, together with five time domain features, obtaining an average classification accuracy equals to 75% on a sample of trans-radial amputees. We propose an automatic feature selection procedure as a minimization problem in order to improve the method and its robustness.