818 resultados para Clustering algorithm
Resumo:
Las redes de sensores inalámbricas son uno de los sectores con más crecimiento dentro de las redes inalámbricas. La rápida adopción de estas redes como solución para muchas nuevas aplicaciones ha llevado a un creciente tráfico en el espectro radioeléctrico. Debido a que las redes inalámbricas de sensores operan en las bandas libres Industrial, Scientific and Medical (ISM) se ha producido una saturación del espectro que en pocos años no permitirá un buen funcionamiento. Con el objetivo de solucionar este tipo de problemas ha aparecido el paradigma de Radio Cognitiva (CR). La introducción de las capacidades cognitivas en las redes inalámbricas de sensores permite utilizar estas redes para aplicaciones con unos requisitos más estrictos respecto a fiabilidad, cobertura o calidad de servicio. Estas redes que aúnan todas estas características son llamadas redes de sensores inalámbricas cognitivas (CWSNs). La mejora en prestaciones de las CWSNs permite su utilización en aplicaciones críticas donde antes no podían ser utilizadas como monitorización de estructuras, de servicios médicos, en entornos militares o de vigilancia. Sin embargo, estas aplicaciones también requieren de otras características que la radio cognitiva no nos ofrece directamente como, por ejemplo, la seguridad. La seguridad en CWSNs es un aspecto poco desarrollado al ser una característica no esencial para su funcionamiento, como pueden serlo el sensado del espectro o la colaboración. Sin embargo, su estudio y mejora es esencial de cara al crecimiento de las CWSNs. Por tanto, esta tesis tiene como objetivo implementar contramedidas usando las nuevas capacidades cognitivas, especialmente en la capa física, teniendo en cuenta las limitaciones con las que cuentan las WSNs. En el ciclo de trabajo de esta tesis se han desarrollado dos estrategias de seguridad contra ataques de especial importancia en redes cognitivas: el ataque de simulación de usuario primario (PUE) y el ataque contra la privacidad eavesdropping. Para mitigar el ataque PUE se ha desarrollado una contramedida basada en la detección de anomalías. Se han implementado dos algoritmos diferentes para detectar este ataque: el algoritmo de Cumulative Sum y el algoritmo de Data Clustering. Una vez comprobado su validez se han comparado entre sí y se han investigado los efectos que pueden afectar al funcionamiento de los mismos. Para combatir el ataque de eavesdropping se ha desarrollado una contramedida basada en la inyección de ruido artificial de manera que el atacante no distinga las señales con información del ruido sin verse afectada la comunicación que nos interesa. También se ha estudiado el impacto que tiene esta contramedida en los recursos de la red. Como resultado paralelo se ha desarrollado un marco de pruebas para CWSNs que consta de un simulador y de una red de nodos cognitivos reales. Estas herramientas han sido esenciales para la implementación y extracción de resultados de la tesis. ABSTRACT Wireless Sensor Networks (WSNs) are one of the fastest growing sectors in wireless networks. The fast introduction of these networks as a solution in many new applications has increased the traffic in the radio spectrum. Due to the operation of WSNs in the free industrial, scientific, and medical (ISM) bands, saturation has ocurred in these frequencies that will make the same operation methods impossible in the future. Cognitive radio (CR) has appeared as a solution for this problem. The networks that join all the mentioned features together are called cognitive wireless sensor networks (CWSNs). The adoption of cognitive features in WSNs allows the use of these networks in applications with higher reliability, coverage, or quality of service requirements. The improvement of the performance of CWSNs allows their use in critical applications where they could not be used before such as structural monitoring, medical care, military scenarios, or security monitoring systems. Nevertheless, these applications also need other features that cognitive radio does not add directly, such as security. The security in CWSNs has not yet been explored fully because it is not necessary field for the main performance of these networks. Instead, other fields like spectrum sensing or collaboration have been explored deeply. However, the study of security in CWSNs is essential for their growth. Therefore, the main objective of this thesis is to study the impact of some cognitive radio attacks in CWSNs and to implement countermeasures using new cognitive capabilities, especially in the physical layer and considering the limitations of WSNs. Inside the work cycle of this thesis, security strategies against two important kinds of attacks in cognitive networks have been developed. These attacks are the primary user emulator (PUE) attack and the eavesdropping attack. A countermeasure against the PUE attack based on anomaly detection has been developed. Two different algorithms have been implemented: the cumulative sum algorithm and the data clustering algorithm. After the verification of these solutions, they have been compared and the side effects that can disturb their performance have been analyzed. The developed approach against the eavesdropping attack is based on the generation of artificial noise to conceal information messages. The impact of this countermeasure on network resources has also been studied. As a parallel result, a new framework for CWSNs has been developed. This includes a simulator and a real network with cognitive nodes. This framework has been crucial for the implementation and extraction of the results presented in this thesis.
Resumo:
One of the main challenges of fuzzy community detection problems is to be able to measure the quality of a fuzzy partition. In this paper, we present an alternative way of measuring the quality of a fuzzy community detection output based on n-dimensional grouping and overlap functions. Moreover, the proposed modularity measure generalizes the classical Girvan–Newman (GN) modularity for crisp community detection problems and also for crisp overlapping community detection problems. Therefore, it can be used to compare partitions of different nature (i.e. those composed of classical, overlapping and fuzzy communities). Particularly, as is usually done with the GN modularity, the proposed measure may be used to identify the optimal number of communities to be obtained by any network clustering algorithm in a given network. We illustrate this usage by adapting in this way a well-known algorithm for fuzzy community detection problems, extending it to also deal with overlapping community detection problems and produce a ranking of the overlapping nodes. Some computational experiments show the feasibility of the proposed approach to modularity measures through n-dimensional overlap and grouping functions.
Resumo:
In this paper, we propose a novel method for the unsupervised clustering of graphs in the context of the constellation approach to object recognition. Such method is an EM central clustering algorithm which builds prototypical graphs on the basis of fast matching with graph transformations. Our experiments, both with random graphs and in realistic situations (visual localization), show that our prototypes improve the set median graphs and also the prototypes derived from our previous incremental method. We also discuss how the method scales with a growing number of images.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
We describe a network module detection approach which combines a rapid and robust clustering algorithm with an objective measure of the coherence of the modules identified. The approach is applied to the network of genetic regulatory interactions surrounding the tumor suppressor gene p53. This algorithm identifies ten clusters in the p53 network, which are visually coherent and biologically plausible.
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
Textured regions in images can be defined as those regions containing a signal which has some measure of randomness. This thesis is concerned with the description of homogeneous texture in terms of a signal model and to develop a means of spatially separating regions of differing texture. A signal model is presented which is based on the assumption that a large class of textures can adequately be represented by their Fourier amplitude spectra only, with the phase spectra modelled by a random process. It is shown that, under mild restrictions, the above model leads to a stationary random process. Results indicate that this assumption is valid for those textures lacking significant local structure. A texture segmentation scheme is described which separates textured regions based on the assumption that each texture has a different distribution of signal energy within its amplitude spectrum. A set of bandpass quadrature filters are applied to the original signal and the envelope of the output of each filter taken. The filters are designed to have maximum mutual energy concentration in both the spatial and spatial frequency domains thus providing high spatial and class resolutions. The outputs of these filters are processed using a multi-resolution classifier which applies a clustering algorithm on the data at a low spatial resolution and then performs a boundary estimation operation in which processing is carried out over a range of spatial resolutions. Results demonstrate a high performance, in terms of the classification error, for a range of synthetic and natural textures
Resumo:
This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
Skeletal muscle consists of muscle fiber types that have different physiological and biochemical characteristics. Basically, the muscle fiber can be classified into type I and type II, presenting, among other features, contraction speed and sensitivity to fatigue different for each type of muscle fiber. These fibers coexist in the skeletal muscles and their relative proportions are modulated according to the muscle functionality and the stimulus that is submitted. To identify the different proportions of fiber types in the muscle composition, many studies use biopsy as standard procedure. As the surface electromyography (EMGs) allows to extract information about the recruitment of different motor units, this study is based on the assumption that it is possible to use the EMG to identify different proportions of fiber types in a muscle. The goal of this study was to identify the characteristics of the EMG signals which are able to distinguish, more precisely, different proportions of fiber types. Also was investigated the combination of characteristics using appropriate mathematical models. To achieve the proposed objective, simulated signals were developed with different proportions of motor units recruited and with different signal-to-noise ratios. Thirteen characteristics in function of time and the frequency were extracted from emulated signals. The results for each extracted feature of the signals were submitted to the clustering algorithm k-means to separate the different proportions of motor units recruited on the emulated signals. Mathematical techniques (confusion matrix and analysis of capability) were implemented to select the characteristics able to identify different proportions of muscle fiber types. As a result, the average frequency and median frequency were selected as able to distinguish, with more precision, the proportions of different muscle fiber types. Posteriorly, the features considered most able were analyzed in an associated way through principal component analysis. Were found two principal components of the signals emulated without noise (CP1 and CP2) and two principal components of the noisy signals (CP1 and CP2 ). The first principal components (CP1 and CP1 ) were identified as being able to distinguish different proportions of muscle fiber types. The selected characteristics (median frequency, mean frequency, CP1 and CP1 ) were used to analyze real EMGs signals, comparing sedentary people with physically active people who practice strength training (weight training). The results obtained with the different groups of volunteers show that the physically active people obtained higher values of mean frequency, median frequency and principal components compared with the sedentary people. Moreover, these values decreased with increasing power level for both groups, however, the decline was more accented for the group of physically active people. Based on these results, it is assumed that the volunteers of the physically active group have higher proportions of type II fibers than sedentary people. Finally, based on these results, we can conclude that the selected characteristics were able to distinguish different proportions of muscle fiber types, both for the emulated signals as to the real signals. These characteristics can be used in several studies, for example, to evaluate the progress of people with myopathy and neuromyopathy due to the physiotherapy, and also to analyze the development of athletes to improve their muscle capacity according to their sport. In both cases, the extraction of these characteristics from the surface electromyography signals provides a feedback to the physiotherapist and the coach physical, who can analyze the increase in the proportion of a given type of fiber, as desired in each case.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
This paper addresses the estimation of object boundaries from a set of 3D points. An extension of the constrained clustering algorithm developed by Abrantes and Marques in the context of edge linking is presented. The object surface is approximated using rectangular meshes and simplex nets. Centroid-based forces are used for attracting the model nodes towards the data, using competitive learning methods. It is shown that competitive learning improves the model performance in the presence of concavities and allows to discriminate close surfaces. The proposed model is evaluated using synthetic data and medical images (MRI and ultrasound images).
Resumo:
Les courriels Spams (courriels indésirables ou pourriels) imposent des coûts annuels extrêmement lourds en termes de temps, d’espace de stockage et d’argent aux utilisateurs privés et aux entreprises. Afin de lutter efficacement contre le problème des spams, il ne suffit pas d’arrêter les messages de spam qui sont livrés à la boîte de réception de l’utilisateur. Il est obligatoire, soit d’essayer de trouver et de persécuter les spammeurs qui, généralement, se cachent derrière des réseaux complexes de dispositifs infectés, ou d’analyser le comportement des spammeurs afin de trouver des stratégies de défense appropriées. Cependant, une telle tâche est difficile en raison des techniques de camouflage, ce qui nécessite une analyse manuelle des spams corrélés pour trouver les spammeurs. Pour faciliter une telle analyse, qui doit être effectuée sur de grandes quantités des courriels non classés, nous proposons une méthodologie de regroupement catégorique, nommé CCTree, permettant de diviser un grand volume de spams en des campagnes, et ce, en se basant sur leur similarité structurale. Nous montrons l’efficacité et l’efficience de notre algorithme de clustering proposé par plusieurs expériences. Ensuite, une approche d’auto-apprentissage est proposée pour étiqueter les campagnes de spam en se basant sur le but des spammeur, par exemple, phishing. Les campagnes de spam marquées sont utilisées afin de former un classificateur, qui peut être appliqué dans la classification des nouveaux courriels de spam. En outre, les campagnes marquées, avec un ensemble de quatre autres critères de classement, sont ordonnées selon les priorités des enquêteurs. Finalement, une structure basée sur le semiring est proposée pour la représentation abstraite de CCTree. Le schéma abstrait de CCTree, nommé CCTree terme, est appliqué pour formaliser la parallélisation du CCTree. Grâce à un certain nombre d’analyses mathématiques et de résultats expérimentaux, nous montrons l’efficience et l’efficacité du cadre proposé.
Resumo:
Esta tese incide sobre o desenvolvimento de modelos computacionais e de aplicações para a gestão do lado da procura, no âmbito das redes elétricas inteligentes. É estudado o desempenho dos intervenientes da rede elétrica inteligente, sendo apresentado um modelo do produtor-consumidor doméstico. O problema de despacho económico considerando previsão de produção e consumo de energia obtidos a partir de redes neuronais artificiais é apresentado. São estudados os modelos existentes no âmbito dos programas de resposta à procura e é desenvolvida uma ferramenta computacional baseada no algoritmo de fuzzy-clustering subtrativo. São analisados perfis de consumo e modos de operação, incluindo uma breve análise da introdução do veículo elétrico e de contingências na rede de energia elétrica. São apresentadas aplicações para a gestão de energia dos consumidores no âmbito do projeto piloto InovGrid. São desenvolvidos sistemas de automação para, aquisição monitorização, controlo e supervisão do consumo a partir de dados fornecidos pelos contadores inteligente que permitem a incorporação das ações dos consumidores na gestão do consumo de energia elétrica; SMART GRIDS - COMPUTATIONAL MODELS DEVELOPMENT AND DEMAND SIDE MANAGMENT APPLICATIONS Abstract: This thesis focuses on the development of computational models and its applications on the demand side management within the smart grid scope. The performance of the electrical network players is studied and a domestic prosumer model is presented. The economic dispatch problem considering the production forecast and the energy consumption obtained from artificial neural networks is also presented. The existing demand response models are studied and a computational tool based on the fuzzy subtractive clustering algorithm is developed. Energy consumption profiles and operational modes are analyzed, including a brief analysis of the electrical vehicle and contingencies on the electrical network. Consumer energy management applications within the scope of InovGrid pilot project are presented. Computational systems are developed for the acquisition, monitoring, control and supervision of consumption data provided by smart meters allowing to incorporate consumer actions on their electrical energy management.
Resumo:
In many major cities, fixed route transit systems such as bus and rail serve millions of trips per day. These systems have people collect at common locations (the station or stop), and board at common times (for example according to a predetermined schedule or headway). By using common service locations and times, these modes can consolidate many trips that have similar origins and destinations or overlapping routes. However, the routes are not sensitive to changing travel patterns, and have no way of identifying which trips are going unserved, or are poorly served, by the existing routes. On the opposite end of the spectrum, personal modes of transportation, such as a private vehicle or taxi, offer service to and from the exact origin and destination of a rider, at close to exactly the time they desire to travel. Despite the apparent increased convenience to users, the presence of a large number of small vehicles results in a disorganized, and potentially congested road network during high demand periods. The focus of the research presented in this paper is to develop a system that possesses both the on-demand nature of a personal mode, with the efficiency of shared modes. In this system, users submit their request for travel, but are asked to make small compromises in their origin and destination location by walking to a nearby meeting point, as well as slightly modifying their time of travel, in order to accommodate other passengers. Because the origin and destination location of the request can be adjusted, this is a more general case of the Dial-a-Ride problem with time windows. The solution methodology uses a graph clustering algorithm coupled with a greedy insertion technique. A case study is presented using actual requests for taxi trips in Washington DC, and shows a significant decrease in the number of vehicles required to serve the demand.