820 resultados para Data classification


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this thesis is to develop and generalize further the differential evolution based data classification method. For many years, evolutionary algorithms have been successfully applied to many classification tasks. Evolution algorithms are population based, stochastic search algorithms that mimic natural selection and genetics. Differential evolution is an evolutionary algorithm that has gained popularity because of its simplicity and good observed performance. In this thesis a differential evolution classifier with pool of distances is proposed, demonstrated and initially evaluated. The differential evolution classifier is a nearest prototype vector based classifier that applies a global optimization algorithm, differential evolution, to determine the optimal values for all free parameters of the classifier model during the training phase of the classifier. The differential evolution classifier applies the individually optimized distance measure for each new data set to be classified is generalized to cover a pool of distances. Instead of optimizing a single distance measure for the given data set, the selection of the optimal distance measure from a predefined pool of alternative measures is attempted systematically and automatically. Furthermore, instead of only selecting the optimal distance measure from a set of alternatives, an attempt is made to optimize the values of the possible control parameters related with the selected distance measure. Specifically, a pool of alternative distance measures is first created and then the differential evolution algorithm is applied to select the optimal distance measure that yields the highest classification accuracy with the current data. After determining the optimal distance measures for the given data set together with their optimal parameters, all determined distance measures are aggregated to form a single total distance measure. The total distance measure is applied to the final classification decisions. The actual classification process is still based on the nearest prototype vector principle; a sample belongs to the class represented by the nearest prototype vector when measured with the optimized total distance measure. During the training process the differential evolution algorithm determines the optimal class vectors, selects optimal distance metrics, and determines the optimal values for the free parameters of each selected distance measure. The results obtained with the above method confirm that the choice of distance measure is one of the most crucial factors for obtaining higher classification accuracy. The results also demonstrate that it is possible to build a classifier that is able to select the optimal distance measure for the given data set automatically and systematically. After finding optimal distance measures together with optimal parameters from the particular distance measure results are then aggregated to form a total distance, which will be used to form the deviation between the class vectors and samples and thus classify the samples. This thesis also discusses two types of aggregation operators, namely, ordered weighted averaging (OWA) based multi-distances and generalized ordered weighted averaging (GOWA). These aggregation operators were applied in this work to the aggregation of the normalized distance values. The results demonstrate that a proper combination of aggregation operator and weight generation scheme play an important role in obtaining good classification accuracy. The main outcomes of the work are the six new generalized versions of previous method called differential evolution classifier. All these DE classifier demonstrated good results in the classification tasks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This Project aims to develop methods for data classification in a Data Warehouse for decision-making purposes. We also have as another goal the reduction of an attribute set in a Data Warehouse, in which a given reduced set is capable of keeping the same properties of the original one. Once we achieve a reduced set, we have a smaller computational cost of processing, we are able to identify non-relevant attributes to certain kinds of situations, and finally we are also able to recognize patterns in the database that will help us to take decisions. In order to achieve these main objectives, it will be implemented the Rough Sets algorithm. We chose PostgreSQL as our data base management system due to its efficiency, consolidation and finally, it’s an open-source system (free distribution)

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Semi-supervised learning is a classification paradigm in which just a few labeled instances are available for the training process. To overcome this small amount of initial label information, the information provided by the unlabeled instances is also considered. In this paper, we propose a nature-inspired semi-supervised learning technique based on attraction forces. Instances are represented as points in a k-dimensional space, and the movement of data points is modeled as a dynamical system. As the system runs, data items with the same label cooperate with each other, and data items with different labels compete among them to attract unlabeled points by applying a specific force function. In this way, all unlabeled data items can be classified when the system reaches its stable state. Stability analysis for the proposed dynamical system is performed and some heuristics are proposed for parameter setting. Simulation results show that the proposed technique achieves good classification results on artificial data sets and is comparable to well-known semi-supervised techniques using benchmark data sets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Cluster analysis of reference sites with similar biota is the initial step in creating River Invertebrate Prediction and Classification System (RIVPACS) and similar river bioassessment models such as Australian River Assessment System (AUSRIVAS). This paper describes and tests an alternative prediction method, Assessment by Nearest Neighbour Analysis (ANNA), based on the same philosophy as RIVPACS and AUSRIVAS but without the grouping step that some people view as artificial. 2. The steps in creating ANNA models are: (i) weighting the predictor variables using a multivariate approach analogous to principal axis correlations, (ii) calculating the weighted Euclidian distance from a test site to the reference sites based on the environmental predictors, (iii) predicting the faunal composition based on the nearest reference sites and (iv) calculating an observed/expected (O/E) analogous to RIVPACS/AUSRIVAS. 3. The paper compares AUSRIVAS and ANNA models on 17 datasets representing a variety of habitats and seasons. First, it examines each model's regressions for Observed versus Expected number of taxa, including the r(2), intercept and slope. Second, the two models' assessments of 79 test sites in New Zealand are compared. Third, the models are compared on test and presumed reference sites along a known trace metal gradient. Fourth, ANNA models are evaluated for western Australia, a geographically distinct region of Australia. The comparisons demonstrate that ANNA and AUSRIVAS are generally equivalent in performance, although ANNA turns out to be potentially more robust for the O versus E regressions and is potentially more accurate on the trace metal gradient sites. 4. The ANNA method is recommended for use in bioassessment of rivers, at least for corroborating the results of the well established AUSRIVAS- and RIVPACS-type models, if not to replace them.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para a obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Energia

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este documento foi redigido no âmbito da dissertação do Mestrado em Engenharia Informática na área de Arquiteturas, Sistemas e Redes, do Departamento de Engenharia Informática, do ISEP, cujo tema é diagnóstico cardíaco a partir de dados acústicos e clínicos. O objetivo deste trabalho é produzir um método que permita diagnosticar automaticamente patologias cardíacas utilizando técnicas de classificação de data mining. Foram utilizados dois tipos de dados: sons cardíacos gravados em ambiente hospitalar e dados clínicos. Numa primeira fase, exploraram-se os sons cardíacos usando uma abordagem baseada em motifs. Numa segunda fase, utilizamos os dados clínicos anotados dos pacientes. Numa terceira fase, avaliamos a combinação das duas abordagens. Na avaliação experimental os modelos baseados em motifs obtiveram melhores resultados do que os construídos a partir dos dados clínicos. A combinação das abordagens mostrou poder ser vantajosa em situações pontuais.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Monissasovelluksissa on hyvin tärkeää vähentää valolähteen vaikutusta kohteen oikean värin havainnoimiseksi. Tämä on tarpeen mm. virtuaalisissa museoissa, telelääketieteessä, verkkokaupassa ja verkkorahassa. Tässä tutkielmassa on kehitetty tekniikkaa kirkkaiden heijastusten poistoon spektrikuvista. Työ sisältää katsauksen yleisen värillisen kuvan ymmärtämiseen, mihin perustuen analysoitiin erilaisia kirkkaiden heijastusten poistO'tekniikoita. Työssä kehitettiin uusi kirkkaiden heijastusten poistO'menetelmä, joka perustuu dikromaattiseen heijastus-malliin, joka kuvaa spektrisen datan objektin omaan väriin ja valaisevan valon väriin perustuen. Ehdotettu kirkkaiden heijastusten poistO'menetelmä hyödyntää erilaisia olemassaolevia menetelmiä, kuten pääkomponenttimenetelmää ja tiedon luokittelu-menetelmää. Yritys kehittää nopeasti toimiva algoritmi, joka myös suoriutuu tehtävästä hyvin, on onnistunut. Kokeet toteutettiin ehdotetun menetelmän mukaisesti ja toimivalla algoritmilla saatiin halutut lopputulokset. Edelleentyö sisältää ehdotuksia esitetyn algoritmin parantamiseksi.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis studies the properties and usability of operators called t-norms, t-conorms, uninorms, as well as many valued implications and equivalences. Into these operators, weights and a generalized mean are embedded for aggregation, and they are used for comparison tasks and for this reason they are referred to as comparison measures. The thesis illustrates how these operators can be weighted with a differential evolution and aggregated with a generalized mean, and the kinds of measures of comparison that can be achieved from this procedure. New operators suitable for comparison measures are suggested. These operators are combination measures based on the use of t-norms and t-conorms, the generalized 3_-uninorm and pseudo equivalence measures based on S-type implications. The empirical part of this thesis demonstrates how these new comparison measures work in the field of classification, for example, in the classification of medical data. The second application area is from the field of sports medicine and it represents an expert system for defining an athlete's aerobic and anaerobic thresholds. The core of this thesis offers definitions for comparison measures and illustrates that there is no actual difference in the results achieved in comparison tasks, by the use of comparison measures based on distance, versus comparison measures based on many valued logical structures. The approach has been highly practical in this thesis and all usage of the measures has been validated mainly by practical testing. In general, many different types of operators suitable for comparison tasks have been presented in fuzzy logic literature and there has been little or no experimental work with these operators.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A Segurança Empresarial é um programa preventivo que visa proteger os valores de uma empresa, para preservação das condições operacionais dentro da normalidade. As empresas estão sujeitas a ameaças relacionadas às suas atividades no mercado em que atuam. Muitas dessas ameaças são conhecidas e identificadas e, dentro do possível, é estabelecido um programa de proteção de seus interesses. Entretanto, as ameaças aos demais bens da empresa não são evidentes, nem perceptíveis tão facilmente. As medidas tomadas para prevenir a efetivação dessas ameaças, por meio de um Programa de Segurança Empresarial ou para minorar os problemas decorrentes da concretização dessas ameaças, até a volta às condições habituais de operação das empresas, por meio da execução de um Plano de Contingência, são vistas, não raramente, como despesas e poucas vezes como investimento com retomo. Nem geram resultados operacionais para a empresa. o objetivo final desta dissertação de mestrado foi estabelecer um Programa de Segurança Empresarial para a Fundação Getulio Vargas - FGV, recomendando a revisão das normas e procedimentos de segurança que comporão o Manual de Segurança e sugerir que a FGV o inclua no Plano Estratégico. Para alcançar o objetivo final. foram definidos: Conceitos Básicos de Segurança e Contingência; Critérios de Classificação de Dados; Gestão do Processo de Segurança e Contingência; Função Administração de Segurança. Alguns tipos de pesquisa foram aplicados. Quanto aos meios, caracterizou-se como estudo de caso; para a elaboração das recomendações de revisão das normas e procedimentos de segurança, foram aplicadas pesquisas bibliográfica, documental e de campo. Quanto ao fim, pesquisa aplicada, motivada pela necessidade de resolver problemas concretos, com finalidade prática. Foram efetuadas visitas e entrevistas nas instalações da sede da FGV e no prédio da Bolsa de Valores do Rio de Janeiro - BVRJ. Dada a complexidade e abrangência da Segurança Empresarial, o estudo foi limitado a tratar de segurança física. Foi observada a necessidade da FGV rever as Normas e Procedimentos de Segurança Empresarial, buscando melhorar o nível de segurança geral e adotar instrumentos modernos de prevenção de ocorrência de sinistros que ponham em risco os valores da FGV, a saber: pessoas, instalações físicas, equipamentos, informações, suprimentos e facilidades de comunicação. Está sendo recomendada a adoção da metodologia Sistemática Integrada de Segurança e Contingência - SISC, para minimizarão dos impactos de situações emergências.