947 resultados para spatial clustering algorithms
Resumo:
The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Pós-graduação em Doenças Tropicais - FMB
Resumo:
Skin segmentation is a challenging task due to several influences such as unknown lighting conditions, skin colored background, and camera limitations. A lot of skin segmentation approaches were proposed in the past including adaptive (in the sense of updating the skin color online) and non-adaptive approaches. In this paper, we compare three skin segmentation approaches that are promising to work well for hand tracking, which is our main motivation for this work. Hand tracking can widely be used in VR/AR e.g. navigation and object manipulation. The first skin segmentation approach is a well-known non-adaptive approach. It is based on a simple, pre-computed skin color distribution. Methods two and three adaptively estimate the skin color in each frame utilizing clustering algorithms. The second approach uses a hierarchical clustering for a simultaneous image and color space segmentation, while the third approach is a pure color space clustering, but with a more sophisticated clustering approach. For evaluation, we compared the segmentation results of the approaches against a ground truth dataset. To obtain the ground truth dataset, we labeled about 500 images captured under various conditions.
Resumo:
This paper presents fuzzy clustering algorithms to establish a grassroots ontology – a machine-generated weak ontology – based on folksonomies. Furthermore, it describes a search engine for vaguely associated terms and aggregates them into several meaningful cluster categories, based on the introduced weak grassroots ontology. A potential application of this ontology, weblog extraction, is illustrated using a simple example. Added value and possible future studies are discussed in the conclusion.
Resumo:
This paper uses folksonomies and fuzzy clustering algorithms to establish term-relevant related results. This paper will propose a Meta search engine with the ability to search for vaguely associated terms and aggregate them into several meaningful cluster categories. The potential of the fuzzy weblog extraction is illustrated using a simple example and added value and possible future studies are discussed in the conclusion.
Resumo:
Infections with Schmallenberg virus (SBV), a novel Orthobunyavirus transmitted by biting midges, can cause abortions and malformations of newborns and severe symptoms in adults of domestic and wild ruminants. Understanding the temporal and spatial distribution of the virus in a certain territory is important for the control and prevention of the disease. In this study, seroprevalence of antibodies against SBV and the spatial spread of the virus was investigated in Swiss dairy cattle applying a milk serology technique on bulk milk samples. The seroprevalence in cattle herds was significantly higher in December 2012 (99.5%) compared to July 2012 (19.7%). This high between-herd seroprevalence in cattle herds was observed shortly after the first detection of viral infections. Milk samples originating from farms with seropositive animals taken in December 2012 (n=209; mean 160%) revealed significantly higher S/P% ratios than samples collected in July 2012 (n=48; mean 103.6%). This finding suggests a high within-herd seroprevalence in infected herds which makes testing of bulk tank milk samples for the identification farms with past exposures to SBV a sensitive method. It suggests also that within-herd transmission followed by seroconversion still occurred between July and December. In July 2012, positive bulk tank milk samples were mainly restricted to the western part of Switzerland whereas in December 2012, all samples except one were positive. A spatial analysis revealed a separation of regions with and without positive farms in July 2012 and no spatial clustering within the regions with positive farms. In contrast to the spatial dispersion of bluetongue virus, a virus that is also transmitted by Culicoides midges, in 2008 in Switzerland, the spread of SBV occurred from the western to the eastern part of the country. The dispersed incursion of SBV took place in the western part of Switzerland and the virus spread rapidly to the remaining territory. This spatial pattern is consistent with the hypothesis that transmission by Culicoides midges was the main way of spreading.
Resumo:
Facilitation is a major force shaping the structure and diversity of plant communities in terrestrial ecosystems. Detecting positive plant–plant interactions relies on the combination of field experimentation and the demonstration of spatial association between neighboring plants. This has often restricted the study of facilitation to particular sites, limiting the development of systematic assessments of facilitation over regional and global scales. Here we explore whether the frequency of plant spatial associations detected from high-resolution remotely sensed images can be used to infer plant facilitation at the community level in drylands around the globe. We correlated the information from remotely sensed images freely available through Google Earth with detailed field assessments, and used a simple individual-based model to generate patch-size distributions using different assumptions about the type and strength of plant–plant interactions. Most of the patterns found from the remotely sensed images were more right skewed than the patterns from the null model simulating a random distribution. This suggests that the plants in the studied drylands show stronger spatial clustering than expected by chance. We found that positive plant co-occurrence, as measured in the field, was significantly related to the skewness of vegetation patch-size distribution measured using Google Earth images. Our findings suggest that the relative frequency of facilitation may be inferred from spatial pattern signals measured from remotely sensed images, since facilitation often determines positive co-occurrence among neighboring plants. They pave the road for a systematic global assessment of the role of facilitation in terrestrial ecosystems. Read More: http://www.esajournals.org/doi/10.1890/14-2358.1
Resumo:
Purpose. To examine the association between living in proximity to Toxics Release Inventory (TRI) facilities and the incidence of childhood cancer in the State of Texas. ^ Design. This is a secondary data analysis utilizing the publicly available Toxics release inventory (TRI), maintained by the U.S. Environmental protection agency that lists the facilities that release any of the 650 TRI chemicals. Total childhood cancer cases and childhood cancer rate (age 0-14 years) by county, for the years 1995-2003 were used from the Texas cancer registry, available at the Texas department of State Health Services website. Setting: This study was limited to the children population of the State of Texas. ^ Method. Analysis was done using Stata version 9 and SPSS version 15.0. Satscan was used for geographical spatial clustering of childhood cancer cases based on county centroids using the Poisson clustering algorithm which adjusts for population density. Pictorial maps were created using MapInfo professional version 8.0. ^ Results. One hundred and twenty five counties had no TRI facilities in their region, while 129 facilities had at least one TRI facility. An increasing trend for number of facilities and total disposal was observed except for the highest category based on cancer rate quartiles. Linear regression analysis using log transformation for number of facilities and total disposal in predicting cancer rates was computed, however both these variables were not found to be significant predictors. Seven significant geographical spatial clusters of counties for high childhood cancer rates (p<0.05) were indicated. Binomial logistic regression by categorizing the cancer rate in to two groups (<=150 and >150) indicated an odds ratio of 1.58 (CI 1.127, 2.222) for the natural log of number of facilities. ^ Conclusion. We have used a unique methodology by combining GIS and spatial clustering techniques with existing statistical approaches in examining the association between living in proximity to TRI facilities and the incidence of childhood cancer in the State of Texas. Although a concrete association was not indicated, further studies are required examining specific TRI chemicals. Use of this information can enable the researchers and public to identify potential concerns, gain a better understanding of potential risks, and work with industry and government to reduce toxic chemical use, disposal or other releases and the risks associated with them. TRI data, in conjunction with other information, can be used as a starting point in evaluating exposures and risks. ^
Resumo:
Abstract Air pollution is a big threat and a phenomenon that has a specific impact on human health, in addition, changes that occur in the chemical composition of the atmosphere can change the weather and cause acid rain or ozone destruction. Those are phenomena of global importance. The World Health Organization (WHO) considerates air pollution as one of the most important global priorities. Salamanca, Gto., Mexico has been ranked as one of the most polluted cities in this country. The industry of the area led to a major economic development and rapid population growth in the second half of the twentieth century. The impact in the air quality is important and significant efforts have been made to measure the concentrations of pollutants. The main pollution sources are locally based plants in the chemical and power generation sectors. The registered concerning pollutants are Sulphur Dioxide (SO2) and particles on the order of ∼10 micrometers or less (PM10). The prediction in the concentration of those pollutants can be a powerful tool in order to take preventive measures such as the reduction of emissions and alerting the affected population. In this PhD thesis we propose a model to predict concentrations of pollutants SO2 and PM10 for each monitoring booth in the Atmospheric Monitoring Network Salamanca (REDMAS - for its spanish acronym). The proposed models consider the use of meteorological variables as factors influencing the concentration of pollutants. The information used along this work is the current real data from REDMAS. In the proposed model, Artificial Neural Networks (ANN) combined with clustering algorithms are used. The type of ANN used is the Multilayer Perceptron with a hidden layer, using separate structures for the prediction of each pollutant. The meteorological variables used for prediction were: Wind Direction (WD), wind speed (WS), Temperature (T) and relative humidity (RH). Clustering algorithms, K-means and Fuzzy C-means, are used to find relationships between air pollutants and weather variables under consideration, which are added as input of the RNA. Those relationships provide information to the ANN in order to obtain the prediction of the pollutants. The results of the model proposed in this work are compared with the results of a multivariate linear regression and multilayer perceptron neural network. The evaluation of the prediction is calculated with the mean absolute error, the root mean square error, the correlation coefficient and the index of agreement. The results show the importance of meteorological variables in the prediction of the concentration of the pollutants SO2 and PM10 in the city of Salamanca, Gto., Mexico. The results show that the proposed model perform better than multivariate linear regression and multilayer perceptron neural network. The models implemented for each monitoring booth have the ability to make predictions of air quality that can be used in a system of real-time forecasting and human health impact analysis. Among the main results of the development of this thesis we can cite: A model based on artificial neural network combined with clustering algorithms for prediction with a hour ahead of the concentration of each pollutant (SO2 and PM10) is proposed. A different model was designed for each pollutant and for each of the three monitoring booths of the REDMAS. A model to predict the average of pollutant concentration in the next 24 hours of pollutants SO2 and PM10 is proposed, based on artificial neural network combined with clustering algorithms. Model was designed for each booth of the REDMAS and each pollutant separately. Resumen La contaminación atmosférica es una amenaza aguda, constituye un fenómeno que tiene particular incidencia sobre la salud del hombre. Los cambios que se producen en la composición química de la atmósfera pueden cambiar el clima, producir lluvia ácida o destruir el ozono, fenómenos todos ellos de una gran importancia global. La Organización Mundial de la Salud (OMS) considera la contaminación atmosférica como una de las más importantes prioridades mundiales. Salamanca, Gto., México; ha sido catalogada como una de las ciudades más contaminadas en este país. La industria de la zona propició un importante desarrollo económico y un crecimiento acelerado de la población en la segunda mitad del siglo XX. Las afectaciones en el aire son graves y se han hecho importantes esfuerzos por medir las concentraciones de los contaminantes. Las principales fuentes de contaminación son fuentes fijas como industrias químicas y de generación eléctrica. Los contaminantes que se han registrado como preocupantes son el Bióxido de Azufre (SO2) y las Partículas Menores a 10 micrómetros (PM10). La predicción de las concentraciones de estos contaminantes puede ser una potente herramienta que permita tomar medidas preventivas como reducción de emisiones a la atmósfera y alertar a la población afectada. En la presente tesis doctoral se propone un modelo de predicción de concentraci ón de los contaminantes más críticos SO2 y PM10 para cada caseta de monitorización de la Red de Monitorización Atmosférica de Salamanca (REDMAS). Los modelos propuestos plantean el uso de las variables meteorol ógicas como factores que influyen en la concentración de los contaminantes. La información utilizada durante el desarrollo de este trabajo corresponde a datos reales obtenidos de la REDMAS. En el Modelo Propuesto (MP) se aplican Redes Neuronales Artificiales (RNA) combinadas con algoritmos de agrupamiento. La RNA utilizada es el Perceptrón Multicapa con una capa oculta, utilizando estructuras independientes para la predicción de cada contaminante. Las variables meteorológicas disponibles para realizar la predicción fueron: Dirección de Viento (DV), Velocidad de Viento (VV), Temperatura (T) y Humedad Relativa (HR). Los algoritmos de agrupamiento K-means y Fuzzy C-means son utilizados para encontrar relaciones existentes entre los contaminantes atmosféricos en estudio y las variables meteorológicas. Dichas relaciones aportan información a las RNA para obtener la predicción de los contaminantes, la cual es agregada como entrada de las RNA. Los resultados del modelo propuesto en este trabajo son comparados con los resultados de una Regresión Lineal Multivariable (RLM) y un Perceptrón Multicapa (MLP). La evaluación de la predicción se realiza con el Error Medio Absoluto, la Raíz del Error Cuadrático Medio, el coeficiente de correlación y el índice de acuerdo. Los resultados obtenidos muestran la importancia de las variables meteorológicas en la predicción de la concentración de los contaminantes SO2 y PM10 en la ciudad de Salamanca, Gto., México. Los resultados muestran que el MP predice mejor la concentración de los contaminantes SO2 y PM10 que los modelos RLM y MLP. Los modelos implementados para cada caseta de monitorizaci ón tienen la capacidad para realizar predicciones de calidad del aire, estos modelos pueden ser implementados en un sistema que permita realizar la predicción en tiempo real y analizar el impacto en la salud de la población. Entre los principales resultados obtenidos del desarrollo de esta tesis podemos citar: Se propone un modelo basado en una red neuronal artificial combinado con algoritmos de agrupamiento para la predicción con una hora de anticipaci ón de la concentración de cada contaminante (SO2 y PM10). Se diseñó un modelo diferente para cada contaminante y para cada una de las tres casetas de monitorización de la REDMAS. Se propone un modelo de predicción del promedio de la concentración de las próximas 24 horas de los contaminantes SO2 y PM10, basado en una red neuronal artificial combinado con algoritmos de agrupamiento. Se diseñó un modelo para cada caseta de monitorización de la REDMAS y para cada contaminante por separado.
Resumo:
Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.
Resumo:
Cognitive wireless sensor network (CWSN) is a new paradigm, integrating cognitive features in traditional wireless sensor networks (WSNs) to mitigate important problems such as spectrum occupancy. Security in cognitive wireless sensor networks is an important problem since these kinds of networks manage critical applications and data. The specific constraints of WSN make the problem even more critical, and effective solutions have not yet been implemented. Primary user emulation (PUE) attack is the most studied specific attack deriving from new cognitive features. This work discusses a new approach, based on anomaly behavior detection and collaboration, to detect the primary user emulation attack in CWSN scenarios. Two non-parametric algorithms, suitable for low-resource networks like CWSNs, have been used in this work: the cumulative sum and data clustering algorithms. The comparison is based on some characteristics such as detection delay, learning time, scalability, resources, and scenario dependency. The algorithms have been tested using a cognitive simulator that provides important results in this area. Both algorithms have shown to be valid in order to detect PUE attacks, reaching a detection rate of 99% and less than 1% of false positives using collaboration.
Resumo:
Esta Tesis tiene como objetivo principal el desarrollo de métodos de identificación del daño que sean robustos y fiables, enfocados a sistemas estructurales experimentales, fundamentalmente a las estructuras de hormigón armado reforzadas externamente con bandas fibras de polímeros reforzados (FRP). El modo de fallo de este tipo de sistema estructural es crítico, pues generalmente es debido a un despegue repentino y frágil de la banda del refuerzo FRP originado en grietas intermedias causadas por la flexión. La detección de este despegue en su fase inicial es fundamental para prevenir fallos futuros, que pueden ser catastróficos. Inicialmente, se lleva a cabo una revisión del método de la Impedancia Electro-Mecánica (EMI), de cara a exponer sus capacidades para la detección de daño. Una vez la tecnología apropiada es seleccionada, lo que incluye un analizador de impedancias así como novedosos sensores PZT para monitorización inteligente, se ha diseñado un procedimiento automático basado en los registros de impedancias de distintas estructuras de laboratorio. Basándonos en el hecho de que las mediciones de impedancias son posibles gracias a una colocación adecuada de una red de sensores PZT, la estimación de la presencia de daño se realiza analizando los resultados de distintos indicadores de daño obtenidos de la literatura. Para que este proceso sea automático y que no sean necesarios conocimientos previos sobre el método EMI para realizar un experimento, se ha diseñado e implementado un Interfaz Gráfico de Usuario, transformando la medición de impedancias en un proceso fácil e intuitivo. Se evalúa entonces el daño a través de los correspondientes índices de daño, intentando estimar no sólo su severidad, sino también su localización aproximada. El desarrollo de estos experimentos en cualquier estructura genera grandes cantidades de datos que han de ser procesados, y algunas veces los índices de daño no son suficientes para una evaluación completa de la integridad de una estructura. En la mayoría de los casos se pueden encontrar patrones de daño en los datos, pero no se tiene información a priori del estado de la estructura. En este punto, se ha hecho una importante investigación en técnicas de reconocimiento de patrones particularmente en aprendizaje no supervisado, encontrando aplicaciones interesantes en el campo de la medicina. De ahí surge una idea creativa e innovadora: detectar y seguir la evolución del daño en distintas estructuras como si se tratase de un cáncer propagándose por el cuerpo humano. En ese sentido, las lecturas de impedancias se emplean como información intrínseca de la salud de la propia estructura, de forma que se pueden aplicar las mismas técnicas que las empleadas en la investigación del cáncer. En este caso, se ha aplicado un algoritmo de clasificación jerárquica dado que ilustra además la clasificación de los datos de forma gráfica, incluyendo información cualitativa y cuantitativa sobre el daño. Se ha investigado la efectividad de este procedimiento a través de tres estructuras de laboratorio, como son una viga de aluminio, una unión atornillada de aluminio y un bloque de hormigón reforzado con FRP. La primera ayuda a mostrar la efectividad del método en sencillos escenarios de daño simple y múltiple, de forma que las conclusiones extraídas se aplican sobre los otros dos, diseñados para simular condiciones de despegue en distintas estructuras. Demostrada la efectividad del método de clasificación jerárquica de lecturas de impedancias, se aplica el procedimiento sobre las estructuras de hormigón armado reforzadas con bandas de FRP objeto de esta tesis, detectando y clasificando cada estado de daño. Finalmente, y como alternativa al anterior procedimiento, se propone un método para la monitorización continua de la interfase FRP-Hormigón, a través de una red de sensores FBG permanentemente instalados en dicha interfase. De esta forma, se obtienen medidas de deformación de la interfase en condiciones de carga continua, para ser implementadas en un modelo de optimización multiobjetivo, cuya solución se haya por medio de una expansión multiobjetivo del método Particle Swarm Optimization (PSO). La fiabilidad de este último método de detección se investiga a través de sendos ejemplos tanto numéricos como experimentales. ABSTRACT This thesis aims to develop robust and reliable damage identification methods focused on experimental structural systems, in particular Reinforced Concrete (RC) structures externally strengthened with Fiber Reinforced Polymers (FRP) strips. The failure mode of this type of structural system is critical, since it is usually due to sudden and brittle debonding of the FRP reinforcement originating from intermediate flexural cracks. Detection of the debonding in its initial stage is essential thus to prevent future failure, which might be catastrophic. Initially, a revision of the Electro-Mechanical Impedance (EMI) method is carried out, in order to expose its capabilities for local damage detection. Once the appropriate technology is selected, which includes impedance analyzer as well as novel PZT sensors for smart monitoring, an automated procedure has been design based on the impedance signatures of several lab-scale structures. On the basis that capturing impedance measurements is possible thanks to an adequately deployed PZT sensor network, the estimation of damage presence is done by analyzing the results of different damage indices obtained from the literature. In order to make this process automatic so that it is not necessary a priori knowledge of the EMI method to carry out an experimental test, a Graphical User Interface has been designed, turning the impedance measurements into an easy and intuitive procedure. Damage is then assessed through the analysis of the corresponding damage indices, trying to estimate not only the damage severity, but also its approximate location. The development of these tests on any kind of structure generates large amounts of data to be processed, and sometimes the information provided by damage indices is not enough to achieve a complete analysis of the structural health condition. In most of the cases, some damage patterns can be found in the data, but none a priori knowledge of the health condition is given for any structure. At this point, an important research on pattern recognition techniques has been carried out, particularly on unsupervised learning techniques, finding interesting applications in the medicine field. From this investigation, a creative and innovative idea arose: to detect and track the evolution of damage in different structures, as if it were a cancer propagating through a human body. In that sense, the impedance signatures are used to give intrinsic information of the health condition of the structure, so that the same clustering algorithms applied in the cancer research can be applied to the problem addressed in this dissertation. Hierarchical clustering is then applied since it also provides a graphical display of the clustered data, including quantitative and qualitative information about damage. The performance of this approach is firstly investigated using three lab-scale structures, such as a simple aluminium beam, a bolt-jointed aluminium beam and an FRP-strengthened concrete specimen. The first one shows the performance of the method on simple single and multiple damage scenarios, so that the first conclusions can be extracted and applied to the other two experimental tests, which are designed to simulate a debonding condition on different structures. Once the performance of the impedance-based hierarchical clustering method is proven to be successful, it is then applied to the structural system studied in this dissertation, the RC structures externally strengthened with FRP strips, where the debonding failure in the interface between the FRP and the concrete is successfully detected and classified, proving thus the feasibility of this method. Finally, as an alternative to the previous approach, a continuous monitoring procedure of the FRP-Concrete interface is proposed, based on an FBGsensors Network permanently deployed within that interface. In this way, strain measurements can be obtained under controlled loading conditions, and then they are used in order to implement a multi-objective model updating method solved by a multi-objective expansion of the Particle Swarm Optimization (PSO) method. The feasibility of this last proposal is investigated and successfully proven on both numerical and experimental RC beams strengthened with FRP.