882 resultados para Least-squares support vector machine
Resumo:
Botnets are a group of computers infected with a specific sub-set of a malware family and controlled by one individual, called botmaster. This kind of networks are used not only, but also for virtual extorsion, spam campaigns and identity theft. They implement different types of evasion techniques that make it harder for one to group and detect botnet traffic. This thesis introduces one methodology, called CONDENSER, that outputs clusters through a self-organizing map and that identify domain names generated by an unknown pseudo-random seed that is known by the botnet herder(s). Aditionally DNS Crawler is proposed, this system saves historic DNS data for fast-flux and double fastflux detection, and is used to identify live C&Cs IPs used by real botnets. A program, called CHEWER, was developed to automate the calculation of the SVM parameters and features that better perform against the available domain names associated with DGAs. CONDENSER and DNS Crawler were developed with scalability in mind so the detection of fast-flux and double fast-flux networks become faster. We used a SVM for the DGA classififer, selecting a total of 11 attributes and achieving a Precision of 77,9% and a F-Measure of 83,2%. The feature selection method identified the 3 most significant attributes of the total set of attributes. For clustering, a Self-Organizing Map was used on a total of 81 attributes. The conclusions of this thesis were accepted in Botconf through a submited article. Botconf is known conferênce for research, mitigation and discovery of botnets tailled for the industry, where is presented current work and research. This conference is known for having security and anti-virus companies, law enforcement agencies and researchers.
Resumo:
Data Mining surge, hoje em dia, como uma ferramenta importante e crucial para o sucesso de um negócio. O considerável volume de dados que atualmente se encontra disponível, por si só, não traz valor acrescentado. No entanto, as ferramentas de Data Mining, capazes de transformar dados e mais dados em conhecimento, vêm colmatar esta lacuna, constituindo, assim, um trunfo que ninguém quer perder. O presente trabalho foca-se na utilização das técnicas de Data Mining no âmbito da atividade bancária, mais concretamente na sua atividade de telemarketing. Neste trabalho são aplicados catorze algoritmos a uma base de dados proveniente do call center de um banco português, resultante de uma campanha para a angariação de clientes para depósitos a prazo com taxas de juro favoráveis. Os catorze algoritmos aplicados no caso prático deste projeto podem ser agrupados em sete grupos: Árvores de Decisão, Redes Neuronais, Support Vector Machine, Voted Perceptron, métodos Ensemble, aprendizagem Bayesiana e Regressões. De forma a beneficiar, ainda mais, do que a área de Data Mining tem para oferecer, este trabalho incide ainda sobre o redimensionamento da base de dados em questão, através da aplicação de duas estratégias de seleção de atributos: Best First e Genetic Search. Um dos objetivos deste trabalho prende-se com a comparação dos resultados obtidos com os resultados presentes no estudo dos autores Sérgio Moro, Raul Laureano e Paulo Cortez (Sérgio Moro, Laureano, & Cortez, 2011). Adicionalmente, pretende-se identificar as variáveis mais relevantes aquando da identificação do potencial cliente deste produto financeiro. Como principais conclusões, depreende-se que os resultados obtidos são comparáveis com os resultados publicados pelos autores mencionados, sendo os mesmos de qualidade e consistentes. O algoritmo Bagging é o que apresenta melhores resultados e a variável referente à duração da chamada telefónica é a que mais influencia o sucesso de campanhas similares.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
Hand gestures are a powerful way for human communication, with lots of potential applications in the area of human computer interaction. Vision-based hand gesture recognition techniques have many proven advantages compared with traditional devices, giving users a simpler and more natural way to communicate with electronic devices. This work proposes a generic system architecture based in computer vision and machine learning, able to be used with any interface for human-computer interaction. The proposed solution is mainly composed of three modules: a pre-processing and hand segmentation module, a static gesture interface module and a dynamic gesture interface module. The experiments showed that the core of visionbased interaction systems could be the same for all applications and thus facilitate the implementation. For hand posture recognition, a SVM (Support Vector Machine) model was trained and used, able to achieve a final accuracy of 99.4%. For dynamic gestures, an HMM (Hidden Markov Model) model was trained for each gesture that the system could recognize with a final average accuracy of 93.7%. The proposed solution as the advantage of being generic enough with the trained models able to work in real-time, allowing its application in a wide range of human-machine applications. To validate the proposed framework two applications were implemented. The first one is a real-time system able to interpret the Portuguese Sign Language. The second one is an online system able to help a robotic soccer game referee judge a game in real time.
Resumo:
A ocupação e consolidação do território na Amazônia apresentam diferentes características relacionadas à dinâmica das conversões de uso e cobertura da terra, que podem ser analisadas utilizando imagens orbitais de sensoriamento remoto. O objetivo do presente trabalho foi avaliar os produtos de detecção de mudanças gerados por análise de vetor de mudança (AVM) e subtração de imagens, a partir de imagens-fração derivadas das imagens ópticas TM/Landsat, para o estudo das conversões de uso e cobertura da terra presentes em área de colonização agrícola na região sudeste de Roraima. Analisaram-se as imagens de mudança provenientes da aplicação do AVM (magnitude, alfa e beta) e da subtração das imagens-fração (solo, sombra e vegetação) quanto à sua capacidade de identificar e discriminar as conversões existentes, de acordo com levantamento de campo. Foram testados dois algoritmos de classificação de imagens do tipo supervisionado, Bhattacharyya e Support Vector Machine. Foram feitos agrupamentos para otimizar a identificação das conversões nas classificações testadas. Houve melhor desempenho do classificador por regiões Bhattacharyya na discriminação das conversões. A utilização das imagens-diferença das frações como informação de entrada para o classificador apresentou qualidade de classificação muito boa ou excelente, sendo superior às classificações utilizando os produtos AVM, isoladamente ou em conjunto com as imagens-diferença.
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
ABSTRACTThe Amazon várzeas are an important component of the Amazon biome, but anthropic and climatic impacts have been leading to forest loss and interruption of essential ecosystem functions and services. The objectives of this study were to evaluate the capability of the Landsat-based Detection of Trends in Disturbance and Recovery (LandTrendr) algorithm to characterize changes in várzeaforest cover in the Lower Amazon, and to analyze the potential of spectral and temporal attributes to classify forest loss as either natural or anthropogenic. We used a time series of 37 Landsat TM and ETM+ images acquired between 1984 and 2009. We used the LandTrendr algorithm to detect forest cover change and the attributes of "start year", "magnitude", and "duration" of the changes, as well as "NDVI at the end of series". Detection was restricted to areas identified as having forest cover at the start and/or end of the time series. We used the Support Vector Machine (SVM) algorithm to classify the extracted attributes, differentiating between anthropogenic and natural forest loss. Detection reliability was consistently high for change events along the Amazon River channel, but variable for changes within the floodplain. Spectral-temporal trajectories faithfully represented the nature of changes in floodplain forest cover, corroborating field observations. We estimated anthropogenic forest losses to be larger (1.071 ha) than natural losses (884 ha), with a global classification accuracy of 94%. We conclude that the LandTrendr algorithm is a reliable tool for studies of forest dynamics throughout the floodplain.
Resumo:
Speaker Recognition, Speaker Verification, Sparse Kernel Logistic Regression, Support Vector Machine
Resumo:
This paper presents a semisupervised support vector machine (SVM) that integrates the information of both labeled and unlabeled pixels efficiently. Method's performance is illustrated in the relevant problem of very high resolution image classification of urban areas. The SVM is trained with the linear combination of two kernels: a base kernel working only with labeled examples is deformed by a likelihood kernel encoding similarities between labeled and unlabeled examples. Results obtained on very high resolution (VHR) multispectral and hyperspectral images show the relevance of the method in the context of urban image classification. Also, its simplicity and the few parameters involved make the method versatile and workable by unexperienced users.
Resumo:
In this paper we present a prototype of a control flow for an a posteriori drug dose adaptation for Chronic Myelogenous Leukemia (CML) patients. The control flow is modeled using Timed Automata extended with Tasks (TAT) model. The feedback loop of the control flow includes the decision-making process for drug dose adaptation. This is based on the outputs of the body response model represented by the Support Vector Machine (SVM) algorithm for drug concentration prediction. The decision is further checked for conformity with the dose level rules of a medical guideline. We also have developed an automatic code synthesizer for the icycom platform as an extension of the TIMES tool.
Resumo:
The application of support vector machine classification (SVM) to combined information from magnetic resonance imaging (MRI) and [F18]fluorodeoxyglucose positron emission tomography (FDG-PET) has been shown to improve detection and differentiation of Alzheimer's disease dementia (AD) and frontotemporal lobar degeneration. To validate this approach for the most frequent dementia syndrome AD, and to test its applicability to multicenter data, we randomly extracted FDG-PET and MRI data of 28 AD patients and 28 healthy control subjects from the database provided by the Alzheimer's Disease Neuroimaging Initiative (ADNI) and compared them to data of 21 patients with AD and 13 control subjects from our own Leipzig cohort. SVM classification using combined volume-of-interest information from FDG-PET and MRI based on comprehensive quantitative meta-analyses investigating dementia syndromes revealed a higher discrimination accuracy in comparison to single modality classification. For the ADNI dataset accuracy rates of up to 88% and for the Leipzig cohort of up to 100% were obtained. Classifiers trained on the ADNI data discriminated the Leipzig cohorts with an accuracy of 91%. In conclusion, our results suggest SVM classification based on quantitative meta-analyses of multicenter data as a valid method for individual AD diagnosis. Furthermore, combining imaging information from MRI and FDG-PET might substantially improve the accuracy of AD diagnosis.
Resumo:
In this paper, we present and apply a semisupervised support vector machine based on cluster kernels for the problem of very high resolution image classification. In the proposed setting, a base kernel working with labeled samples only is deformed by a likelihood kernel encoding similarities between unlabeled examples. The resulting kernel is used to train a standard support vector machine (SVM) classifier. Experiments carried out on very high resolution (VHR) multispectral and hyperspectral images using very few labeled examples show the relevancy of the method in the context of urban image classification. Its simplicity and the small number of parameters involved make it versatile and workable by unexperimented users.
Resumo:
We present a novel filtering method for multispectral satellite image classification. The proposed method learns a set of spatial filters that maximize class separability of binary support vector machine (SVM) through a gradient descent approach. Regularization issues are discussed in detail and a Frobenius-norm regularization is proposed to efficiently exclude uninformative filters coefficients. Experiments carried out on multiclass one-against-all classification and target detection show the capabilities of the learned spatial filters.
Resumo:
To be diagnostically useful, structural MRI must reliably distinguish Alzheimer's disease (AD) from normal aging in individual scans. Recent advances in statistical learning theory have led to the application of support vector machines to MRI for detection of a variety of disease states. The aims of this study were to assess how successfully support vector machines assigned individual diagnoses and to determine whether data-sets combined from multiple scanners and different centres could be used to obtain effective classification of scans. We used linear support vector machines to classify the grey matter segment of T1-weighted MR scans from pathologically proven AD patients and cognitively normal elderly individuals obtained from two centres with different scanning equipment. Because the clinical diagnosis of mild AD is difficult we also tested the ability of support vector machines to differentiate control scans from patients without post-mortem confirmation. Finally we sought to use these methods to differentiate scans between patients suffering from AD from those with frontotemporal lobar degeneration. Up to 96% of pathologically verified AD patients were correctly classified using whole brain images. Data from different centres were successfully combined achieving comparable results from the separate analyses. Importantly, data from one centre could be used to train a support vector machine to accurately differentiate AD and normal ageing scans obtained from another centre with different subjects and different scanner equipment. Patients with mild, clinically probable AD and age/sex matched controls were correctly separated in 89% of cases which is compatible with published diagnosis rates in the best clinical centres. This method correctly assigned 89% of patients with post-mortem confirmed diagnosis of either AD or frontotemporal lobar degeneration to their respective group. Our study leads to three conclusions: Firstly, support vector machines successfully separate patients with AD from healthy aging subjects. Secondly, they perform well in the differential diagnosis of two different forms of dementia. Thirdly, the method is robust and can be generalized across different centres. This suggests an important role for computer based diagnostic image analysis for clinical practice.