6 resultados para Data Mining, Yield Improvement, Self Organising Map, Clustering Quality
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
In this paper artificial neural network (ANN) based on supervised and unsupervised algorithms were investigated for use in the study of rheological parameters of solid pharmaceutical excipients, in order to develop computational tools for manufacturing solid dosage forms. Among four supervised neural networks investigated, the best learning performance was achieved by a feedfoward multilayer perceptron whose architectures was composed by eight neurons in the input layer, sixteen neurons in the hidden layer and one neuron in the output layer. Learning and predictive performance relative to repose angle was poor while to Carr index and Hausner ratio (CI and HR, respectively) showed very good fitting capacity and learning, therefore HR and CI were considered suitable descriptors for the next stage of development of supervised ANNs. Clustering capacity was evaluated for five unsupervised strategies. Network based on purely unsupervised competitive strategies, classic "Winner-Take-All", "Frequency-Sensitive Competitive Learning" and "Rival-Penalize Competitive Learning" (WTA, FSCL and RPCL, respectively) were able to perform clustering from database, however this classification was very poor, showing severe classification errors by grouping data with conflicting properties into the same cluster or even the same neuron. On the other hand it could not be established what was the criteria adopted by the neural network for those clustering. Self-Organizing Maps (SOM) and Neural Gas (NG) networks showed better clustering capacity. Both have recognized the two major groupings of data corresponding to lactose (LAC) and cellulose (CEL). However, SOM showed some errors in classify data from minority excipients, magnesium stearate (EMG) , talc (TLC) and attapulgite (ATP). NG network in turn performed a very consistent classification of data and solve the misclassification of SOM, being the most appropriate network for classifying data of the study. The use of NG network in pharmaceutical technology was still unpublished. NG therefore has great potential for use in the development of software for use in automated classification systems of pharmaceutical powders and as a new tool for mining and clustering data in drug development
Resumo:
Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature
Resumo:
Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Soft skills and teamwork practices were identi ed as the main de ciencies of recent graduates in computer courses. This issue led to a realization of a qualitative research aimed at investigating the challenges faced by professors of those courses in conducting, monitoring and assessing collaborative software development projects. Di erent challenges were reported by teachers, including di culties in the assessment of students both in the collective and individual levels. In this context, a quantitative research was conducted with the aim to map soft skill of students to a set of indicators that can be extracted from software repositories using data mining techniques. These indicators are aimed at measuring soft skills, such as teamwork, leadership, problem solving and the pace of communication. Then, a peer assessment approach was applied in a collaborative software development course of the software engineering major at the Federal University of Rio Grande do Norte (UFRN). This research presents a correlation study between the students' soft skills scores and indicators based on mining software repositories. This study contributes: (i) in the presentation of professors' perception of the di culties and opportunities for improving management and monitoring practices in collaborative software development projects; (ii) in investigating relationships between soft skills and activities performed by students using software repositories; (iii) in encouraging the development of soft skills and the use of software repositories among software engineering students; (iv) in contributing to the state of the art of three important areas of software engineering, namely software engineering education, educational data mining and human aspects of software engineering.
Resumo:
Soft skills and teamwork practices were identi ed as the main de ciencies of recent graduates in computer courses. This issue led to a realization of a qualitative research aimed at investigating the challenges faced by professors of those courses in conducting, monitoring and assessing collaborative software development projects. Di erent challenges were reported by teachers, including di culties in the assessment of students both in the collective and individual levels. In this context, a quantitative research was conducted with the aim to map soft skill of students to a set of indicators that can be extracted from software repositories using data mining techniques. These indicators are aimed at measuring soft skills, such as teamwork, leadership, problem solving and the pace of communication. Then, a peer assessment approach was applied in a collaborative software development course of the software engineering major at the Federal University of Rio Grande do Norte (UFRN). This research presents a correlation study between the students' soft skills scores and indicators based on mining software repositories. This study contributes: (i) in the presentation of professors' perception of the di culties and opportunities for improving management and monitoring practices in collaborative software development projects; (ii) in investigating relationships between soft skills and activities performed by students using software repositories; (iii) in encouraging the development of soft skills and the use of software repositories among software engineering students; (iv) in contributing to the state of the art of three important areas of software engineering, namely software engineering education, educational data mining and human aspects of software engineering.