648 resultados para Relational fuzzy clustering
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
Fuzzy set theory and Fuzzy logic is studied from a mathematical point of view. The main goal is to investigatecommon mathematical structures in various fuzzy logical inference systems and to establish a general mathematical basis for fuzzy logic when considered as multi-valued logic. The study is composed of six distinct publications. The first paper deals with Mattila'sLPC+Ch Calculus. THis fuzzy inference system is an attempt to introduce linguistic objects to mathematical logic without defining these objects mathematically.LPC+Ch Calculus is analyzed from algebraic point of view and it is demonstratedthat suitable factorization of the set of well formed formulae (in fact, Lindenbaum algebra) leads to a structure called ET-algebra and introduced in the beginning of the paper. On its basis, all the theorems presented by Mattila and many others can be proved in a simple way which is demonstrated in the Lemmas 1 and 2and Propositions 1-3. The conclusion critically discusses some other issues of LPC+Ch Calculus, specially that no formal semantics for it is given.In the second paper the characterization of solvability of the relational equation RoX=T, where R, X, T are fuzzy relations, X the unknown one, and o the minimum-induced composition by Sanchez, is extended to compositions induced by more general products in the general value lattice. Moreover, the procedure also applies to systemsof equations. In the third publication common features in various fuzzy logicalsystems are investigated. It turns out that adjoint couples and residuated lattices are very often present, though not always explicitly expressed. Some minor new results are also proved.The fourth study concerns Novak's paper, in which Novak introduced first-order fuzzy logic and proved, among other things, the semantico-syntactical completeness of this logic. He also demonstrated that the algebra of his logic is a generalized residuated lattice. In proving that the examination of Novak's logic can be reduced to the examination of locally finite MV-algebras.In the fifth paper a multi-valued sentential logic with values of truth in an injective MV-algebra is introduced and the axiomatizability of this logic is proved. The paper developes some ideas of Goguen and generalizes the results of Pavelka on the unit interval. Our proof for the completeness is purely algebraic. A corollary of the Completeness Theorem is that fuzzy logic on the unit interval is semantically complete if, and only if the algebra of the valuesof truth is a complete MV-algebra. The Compactness Theorem holds in our well-defined fuzzy sentential logic, while the Deduction Theorem and the Finiteness Theorem do not. Because of its generality and good-behaviour, MV-valued logic can be regarded as a mathematical basis of fuzzy reasoning. The last paper is a continuation of the fifth study. The semantics and syntax of fuzzy predicate logic with values of truth in ana injective MV-algerba are introduced, and a list of universally valid sentences is established. The system is proved to be semanticallycomplete. This proof is based on an idea utilizing some elementary properties of injective MV-algebras and MV-homomorphisms, and is purely algebraic.
Resumo:
Many classification systems rely on clustering techniques in which a collection of training examples is provided as an input, and a number of clusters c1,...cm modelling some concept C results as an output, such that every cluster ci is labelled as positive or negative. Given a new, unlabelled instance enew, the above classification is used to determine to which particular cluster ci this new instance belongs. In such a setting clusters can overlap, and a new unlabelled instance can be assigned to more than one cluster with conflicting labels. In the literature, such a case is usually solved non-deterministically by making a random choice. This paper presents a novel, hybrid approach to solve this situation by combining a neural network for classification along with a defeasible argumentation framework which models preference criteria for performing clustering.
Resumo:
Zonal management in vineyards requires the prior delineation of stable yield zones within the parcel. Among the different methodologies used for zone delineation, cluster analysis of yield data from several years is one of the possibilities cited in scientific literature. However, there exist reasonable doubts concerning the cluster algorithm to be used and the number of zones that have to be delineated within a field. In this paper two different cluster algorithms have been compared (k-means and fuzzy c-means) using the grape yield data corresponding to three successive years (2002, 2003 and 2004), for a ‘Pinot Noir’ vineyard parcel. Final choice of the most recommendable algorithm has been linked to obtaining a stable pattern of spatial yield distribution and to allowing for the delineation of compact and average sized areas. The general recommendation is to use reclassified maps of two clusters or yield classes (low yield zone and high yield zone) and, consequently, the site-specific vineyard management should be based on the prior delineation of just two different zones or sub-parcels. The two tested algorithms are good options for this purpose. However, the fuzzy c-means algorithm allows for a better zoning of the parcel, forming more compact areas and with more equilibrated zonal differences over time.
Resumo:
In view of the importance of anticipating the occurrence of critical situations in medicine, we propose the use of a fuzzy expert system to predict the need for advanced neonatal resuscitation efforts in the delivery room. This system relates the maternal medical, obstetric and neonatal characteristics to the clinical conditions of the newborn, providing a risk measurement of need of advanced neonatal resuscitation measures. It is structured as a fuzzy composition developed on the basis of the subjective perception of danger of nine neonatologists facing 61 antenatal and intrapartum clinical situations which provide a degree of association with the risk of occurrence of perinatal asphyxia. The resulting relational matrix describes the association between clinical factors and risk of perinatal asphyxia. Analyzing the inputs of the presence or absence of all 61 clinical factors, the system returns the rate of risk of perinatal asphyxia as output. A prospectively collected series of 304 cases of perinatal care was analyzed to ascertain system performance. The fuzzy expert system presented a sensitivity of 76.5% and specificity of 94.8% in the identification of the need for advanced neonatal resuscitation measures, considering a cut-off value of 5 on a scale ranging from 0 to 10. The area under the receiver operating characteristic curve was 0.93. The identification of risk situations plays an important role in the planning of health care. These preliminary results encourage us to develop further studies and to refine this model, which is intended to implement an auxiliary system able to help health care staff to make decisions in perinatal care.
Resumo:
Lattice valued fuzziness is more general than crispness or fuzziness based on the unit interval. In this work, we present a query language for a lattice based fuzzy database. We define a Lattice Fuzzy Structured Query Language (LFSQL) taking its membership values from an arbitrary lattice L. LFSQL can handle, manage and represent crisp values, linear ordered membership degrees and also allows membership degrees from lattices with non-comparable values. This gives richer membership degrees, and hence makes LFSQL more flexible than FSQL or SQL. In order to handle vagueness or imprecise information, every entry into an L-fuzzy database is an L-fuzzy set instead of crisp values. All of this makes LFSQL an ideal query language to handle imprecise data where some factors are non-comparable. After defining the syntax of the language formally, we provide its semantics using L-fuzzy sets and relations. The semantics can be used in future work to investigate concepts such as functional dependencies. Last but not least, we present a parser for LFSQL implemented in Haskell.
Resumo:
Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.
Resumo:
We present some additions to a fuzzy variable radius niche technique called Dynamic Niche Clustering (DNC) (Gan and Warwick, 1999; 2000; 2001) that enable the identification and creation of niches of arbitrary shape through a mechanism called Niche Linkage. We show that by using this mechanism it is possible to attain better feature extraction from the underlying population.
Resumo:
Market risk exposure plays a key role for nancial institutions risk management. A possible measure for this exposure is to evaluate losses likely to incurwhen the price of the portfolio's assets declines using Value-at-Risk (VaR) estimates, one of the most prominent measure of nancial downside market risk. This paper suggests an evolving possibilistic fuzzy modeling approach for VaR estimation. The approach is based on an extension of the possibilistic fuzzy c-means clustering and functional fuzzy rule-based modeling, which employs memberships and typicalities to update clusters and creates new clusters based on a statistical control distance-based criteria. ePFM also uses an utility measure to evaluate the quality of the current cluster structure. Computational experiments consider data of the main global equity market indexes of United States, London, Germany, Spain and Brazil from January 2000 to December 2012 for VaR estimation using ePFM, traditional VaR benchmarks such as Historical Simulation, GARCH, EWMA, and Extreme Value Theory and state of the art evolving approaches. The results show that ePFM is a potential candidate for VaR modeling, with better performance than alternative approaches.
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Image segmentation is the process of labeling pixels on di erent objects, an important step in many image processing systems. This work proposes a clustering method for the segmentation of color digital images with textural features. This is done by reducing the dimensionality of histograms of color images and using the Skew Divergence to calculate the fuzzy a nity functions. This approach is appropriate for segmenting images that have colorful textural features such as geological, dermoscopic and other natural images, as images containing mountains, grass or forests. Furthermore, experimental results of colored texture clustering using images of aquifers' sedimentary porous rocks are presented and analyzed in terms of precision to verify its e ectiveness.
Resumo:
A methodology for pipeline leakage detection using a combination of clustering and classification tools for fault detection is presented here. A fuzzy system is used to classify the running mode and identify the operational and process transients. The relationship between these transients and the mass balance deviation are discussed. This strategy allows for better identification of the leakage because the thresholds are adjusted by the fuzzy system as a function of the running mode and the classified transient level. The fuzzy system is initially off-line trained with a modified data set including simulated leakages. The methodology is applied to a small-scale LPG pipeline monitoring case where portability, robustness and reliability are amongst the most important criteria for the detection system. The results are very encouraging with relatively low levels of false alarms, obtaining increased leakage detection with low computational costs. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
O avanço nas áreas de comunicação sem fio e microeletrônica permite o desenvolvimento de equipamentos micro sensores com capacidade de monitorar grandes regiões. Formadas por milhares de nós sensores, trabalhando de forma colaborativa, as Redes de Sensores sem Fio apresentam severas restrições de energia, devido à capacidade limitada das baterias dos nós que compõem a rede. O consumo de energia pode ser minimizado, permitindo que apenas alguns nós especiais, chamados de Cluster Head, sejam responsáveis por receber os dados dos nós que formam seu cluster e propagar estes dados para um ponto de coleta denominado Estação Base. A escolha do Cluster Head ideal influencia no aumento do período de estabilidade da rede, maximizando seu tempo de vida útil. A proposta, apresentada nesta dissertação, utiliza Lógica Fuzzy e algoritmo k-means com base em informações centralizadas na Estação Base para eleição do Cluster Head ideal em Redes de Sensores sem Fio heterogêneas. Os critérios usados para seleção do Cluster Head são baseados na centralidade do nó, nível de energia e proximidade para a Estação Base. Esta dissertação apresenta as desvantagens de utilização de informações locais para eleição do líder do cluster e a importância do tratamento discriminatório sobre as discrepâncias energéticas dos nós que formam a rede. Esta proposta é comparada com os algoritmos Low Energy Adaptative Clustering Hierarchy (LEACH) e Distributed energy-efficient clustering algorithm for heterogeneous Wireless sensor networks (DEEC). Esta comparação é feita, utilizando o final do período de estabilidade, como também, o tempo de vida útil da rede.
Resumo:
Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours
Resumo:
A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection