31 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Symbolic Data Analysis (SDA) main aims to provide tools for reducing large databases to extract knowledge and provide techniques to describe the unit of such data in complex units, as such, interval or histogram. The objective of this work is to extend classical clustering methods for symbolic interval data based on interval-based distance. The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. This work includes an approach allow existing indices to be adapted to interval context. The proposed methods with interval-based distances are compared with distances punctual existing literature through experiments with simulated data and real data interval
Resumo:
Image segmentation is one of the image processing problems that deserves special attention from the scientific community. This work studies unsupervised methods to clustering and pattern recognition applicable to medical image segmentation. Natural Computing based methods have shown very attractive in such tasks and are studied here as a way to verify it's applicability in medical image segmentation. This work treats to implement the following methods: GKA (Genetic K-means Algorithm), GFCMA (Genetic FCM Algorithm), PSOKA (PSO and K-means based Clustering Algorithm) and PSOFCM (PSO and FCM based Clustering Algorithm). Besides, as a way to evaluate the results given by the algorithms, clustering validity indexes are used as quantitative measure. Visual and qualitative evaluations are realized also, mainly using data given by the BrainWeb brain simulator as ground truth
Resumo:
Self-efficacy, the construct developed by Albert Bandura in 1977 and widely studied around the world, means the individual's belief in his own capacity to successfully perform a certain activity. This study aims to determine the degree of association between sociodemographic characteristics and professional training to the levels of Self-Efficacy at Work (SEW) of the Administrative Assistants in a federal university. This is a descriptive research submitted to and approved by the Ethics Committee of UFRN. The method of data analysis, in quantitative nature, was accomplished with the aid of the statistical programs R and Minitab. The instrument used in research was a sociodemographic data questionnaire, variables of professional training and the General Perception of Self-efficacy Scale (GPSES), applied to the sample by 289 Assistants in Administration. Statistical techniques for data analysis were descriptive statistics, cluster analysis, reliability test (Cronbach's alpha), and test of significance (Pearson). Results show a sociodemographic profile of Assistants in Administration of UFRN with well-distributed characteristics, with 48.4% men and 51.6% female; 59.9% of them were aged over 40 years, married (49.3%), color or race white (58%) and Catholics (67.8%); families are composed of up to four people (75.8%) with children (59.4%) of all age groups; the occupation of the mothers of these professionals is mostly housewives (51.6%) with high school education up to parents (72%) and mothers (75.8%). Assistants in Administration have high levels of professional training, most of them composed two groups of servers: the former, recently hired public servants (30.7%) and another with long service (59%), the majority enter young in career and it stays until retirement, 72.4% of these professionals have training above the minimum requirement for the job. The analysis of SEW levels shows medium to high levels for 72% of assistants in administration; low SEWclassified people have shown a high average of 2.7, considered close to the overall mean presented in other studies, which is 2.9. The cluster analysis has allowed us to say that the characteristics of the three groups (Low, Medium and High SEW) are similar and can be found in the three levels of SEW representatives with all the characteristics investigated. The results indicate no association between the sociodemographic variables and professional training to the levels of self-efficacy at work of Assistants in Administration of UFRN, except for the variable color or race. However, due to the small number of people who declared themselves in color or black race (4% of the sample), this result can be interpreted as mere coincidence or the black people addressed in this study have provided a sense of efficacy higher than white and brown ones. The study has corroborated other studies and highlighted the subjectivity of the self-efficacy construct. They are needed more researches, especially with public servants for the continuity and expansion of studies on the subject, making it possible to compare and confirm the results
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means
Resumo:
This work proposes a collaborative system for marking dangerous points in the transport routes and generation of alerts to drivers. It consisted of a proximity warning system for a danger point that is fed by the driver via a mobile device equipped with GPS. The system will consolidate data provided by several different drivers and generate a set of points common to be used in the warning system. Although the application is designed to protect drivers, the data generated by it can serve as inputs for the responsible to improve signage and recovery of public roads
Resumo:
Objective to establish a methodology for the oil spill monitoring on the sea surface, located at the Submerged Exploration Area of the Polo Region of Guamaré, in the State of Rio Grande do Norte, using orbital images of Synthetic Aperture Radar (SAR integrated with meteoceanographycs products. This methodology was applied in the following stages: (1) the creation of a base map of the Exploration Area; (2) the processing of NOAA/AVHRR and ERS-2 images for generation of meteoceanographycs products; (3) the processing of RADARSAT-1 images for monitoring of oil spills; (4) the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products; and (5) the structuring of a data base. The Integration of RADARSAT-1 image of the Potiguar Basin of day 21.05.99 with the base map of the Exploration Area of the Polo Region of Guamaré for the identification of the probable sources of the oil spots, was used successfully in the detention of the probable spot of oil detected next to the exit to the submarine emissary in the Exploration Area of the Polo Region of Guamaré. To support the integration of RADARSAT-1 images with NOAA/AVHRR and ERS-2 image products, a methodology was developed for the classification of oil spills identified by RADARSAT-1 images. For this, the following algorithms of classification not supervised were tested: K-means, Fuzzy k-means and Isodata. These algorithms are part of the PCI Geomatics software, which was used for the filtering of RADARSAT-1 images. For validation of the results, the oil spills submitted to the unsupervised classification were compared to the results of the Semivariogram Textural Classifier (STC). The mentioned classifier was developed especially for oil spill classification purposes and requires PCI software for the whole processing of RADARSAT-1 images. After all, the results of the classifications were analyzed through Visual Analysis; Calculation of Proportionality of Largeness and Analysis Statistics. Amongst the three algorithms of classifications tested, it was noted that there were no significant alterations in relation to the spills classified with the STC, in all of the analyses taken into consideration. Therefore, considering all the procedures, it has been shown that the described methodology can be successfully applied using the unsupervised classifiers tested, resulting in a decrease of time in the identification and classification processing of oil spills, if compared with the utilization of the STC classifier
Resumo:
This study deals with the ethical and human aspects present in the teaching-learning process within the dentists' formation. It arises from the growing need for professionals involved with the quality of the services they provide for the population in health care centers. In this research a qualitative approach was used and data was obtained by means of focal groups, interviews and participative observation. The sample consisted of 28 dentistry students and 33 patients attended at the dentistry course. According to the results, it was shown that the main problems are the excess of authority in the teacher-student-patient relationship and the dissociation of the body-mind-spirit as seen in the biomedical model health practice. These findings show the future professionals' insufficient abilities for developing a satisfactory relationship with their patients and the need of considering these aspects during their formation.
Resumo:
This study deals with the ethical and human aspects present in the teaching-learning process within the dentists' formation. It arises from the growing need for professionals involved with the quality of the services they provide for the population in health care centers. In this research a qualitative approach was used and data was obtained by means of focal groups, interviews and participative observation. The sample consisted of 28 dentistry students and 33 patients attended at the dentistry course. According to the results, it was shown that the main problems are the excess of authority in the teacher-student-patient relationship and the dissociation of the body-mind-spirit as seen in the biomedical model health practice. These findings show the future professionals' insufficient abilities for developing a satisfactory relationship with their patients and the need of considering these aspects during their formation.
Resumo:
Self-efficacy, the construct developed by Albert Bandura in 1977 and widely studied around the world, means the individual's belief in his own capacity to successfully perform a certain activity. This study aims to determine the degree of association between sociodemographic characteristics and professional training to the levels of Self-Efficacy at Work (SEW) of the Administrative Assistants in a federal university. This is a descriptive research submitted to and approved by the Ethics Committee of UFRN. The method of data analysis, in quantitative nature, was accomplished with the aid of the statistical programs R and Minitab. The instrument used in research was a sociodemographic data questionnaire, variables of professional training and the General Perception of Self-efficacy Scale (GPSES), applied to the sample by 289 Assistants in Administration. Statistical techniques for data analysis were descriptive statistics, cluster analysis, reliability test (Cronbach's alpha), and test of significance (Pearson). Results show a sociodemographic profile of Assistants in Administration of UFRN with well-distributed characteristics, with 48.4% men and 51.6% female; 59.9% of them were aged over 40 years, married (49.3%), color or race white (58%) and Catholics (67.8%); families are composed of up to four people (75.8%) with children (59.4%) of all age groups; the occupation of the mothers of these professionals is mostly housewives (51.6%) with high school education up to parents (72%) and mothers (75.8%). Assistants in Administration have high levels of professional training, most of them composed two groups of servers: the former, recently hired public servants (30.7%) and another with long service (59%), the majority enter young in career and it stays until retirement, 72.4% of these professionals have training above the minimum requirement for the job. The analysis of SEW levels shows medium to high levels for 72% of assistants in administration; low SEWclassified people have shown a high average of 2.7, considered close to the overall mean presented in other studies, which is 2.9. The cluster analysis has allowed us to say that the characteristics of the three groups (Low, Medium and High SEW) are similar and can be found in the three levels of SEW representatives with all the characteristics investigated. The results indicate no association between the sociodemographic variables and professional training to the levels of self-efficacy at work of Assistants in Administration of UFRN, except for the variable color or race. However, due to the small number of people who declared themselves in color or black race (4% of the sample), this result can be interpreted as mere coincidence or the black people addressed in this study have provided a sense of efficacy higher than white and brown ones. The study has corroborated other studies and highlighted the subjectivity of the self-efficacy construct. They are needed more researches, especially with public servants for the continuity and expansion of studies on the subject, making it possible to compare and confirm the results
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means
Resumo:
Leprosy is a chronic infectious disease caused by Mycobacterium leprae. It is known for its great disfiguring capacity and is considered an extremely serious disease to public health worldwide. The state of Ceará ranks 13th in number of cases of leprosy in Brazil, and fourth in Northeastern region, with an average of 2,149 new cases diagnosed every year. This study aimed to evaluate the knowledge of leprosy patients regarding treatment, and to assess the level of treatment adherence and its possible barriers. The study was conducted in the reference center for dermatology, from September 2010 to October 2010, in Fortaleza, Ceará. The study data were collected by means of a structured interview, along with the Morisky-Green test, in order to assess treatment adherence and barriers to adherence. A total of 70 patients were interviewed, out of whom 66 were new cases. The majority of patients were between 42 and 50 years old, and 37 (52.9%) were male. Most patients were clinically classified as presentingmultibacillary leprosy (80%), and 78.6% of them were from Fortaleza, Brazil. The Morisky-Green test indicated that 62.9% of patients presented a low level of adherence (p < 0.005), despite claiming to aware of the disease risks. However, it was observed that 57.1% of the patients had no difficulty adhering to treatment, while 38.6% reported little difficulty. This study shows that despite the patients claiming to be familiar with leprosy and its treatment, the Morisky-Green test clearly demonstrated that they actually were not aware of the principles of therapy, which is evidenced by the low degree of treatment adherence