6 resultados para Richards, Lyn: Handling qualitative data
em Cochin University of Science
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
One major component of power system operation is generation scheduling. The objective of the work is to develop efficient control strategies to the power scheduling problems through Reinforcement Learning approaches. The three important active power scheduling problems are Unit Commitment, Economic Dispatch and Automatic Generation Control. Numerical solution methods proposed for solution of power scheduling are insufficient in handling large and complex systems. Soft Computing methods like Simulated Annealing, Evolutionary Programming etc., are efficient in handling complex cost functions, but find limitation in handling stochastic data existing in a practical system. Also the learning steps are to be repeated for each load demand which increases the computation time.Reinforcement Learning (RL) is a method of learning through interactions with environment. The main advantage of this approach is it does not require a precise mathematical formulation. It can learn either by interacting with the environment or interacting with a simulation model. Several optimization and control problems have been solved through Reinforcement Learning approach. The application of Reinforcement Learning in the field of Power system has been a few. The objective is to introduce and extend Reinforcement Learning approaches for the active power scheduling problems in an implementable manner. The main objectives can be enumerated as:(i) Evolve Reinforcement Learning based solutions to the Unit Commitment Problem.(ii) Find suitable solution strategies through Reinforcement Learning approach for Economic Dispatch. (iii) Extend the Reinforcement Learning solution to Automatic Generation Control with a different perspective. (iv) Check the suitability of the scheduling solutions to one of the existing power systems.First part of the thesis is concerned with the Reinforcement Learning approach to Unit Commitment problem. Unit Commitment Problem is formulated as a multi stage decision process. Q learning solution is developed to obtain the optimwn commitment schedule. Method of state aggregation is used to formulate an efficient solution considering the minimwn up time I down time constraints. The performance of the algorithms are evaluated for different systems and compared with other stochastic methods like Genetic Algorithm.Second stage of the work is concerned with solving Economic Dispatch problem. A simple and straight forward decision making strategy is first proposed in the Learning Automata algorithm. Then to solve the scheduling task of systems with large number of generating units, the problem is formulated as a multi stage decision making task. The solution obtained is extended in order to incorporate the transmission losses in the system. To make the Reinforcement Learning solution more efficient and to handle continuous state space, a fimction approximation strategy is proposed. The performance of the developed algorithms are tested for several standard test cases. Proposed method is compared with other recent methods like Partition Approach Algorithm, Simulated Annealing etc.As the final step of implementing the active power control loops in power system, Automatic Generation Control is also taken into consideration.Reinforcement Learning has already been applied to solve Automatic Generation Control loop. The RL solution is extended to take up the approach of common frequency for all the interconnected areas, more similar to practical systems. Performance of the RL controller is also compared with that of the conventional integral controller.In order to prove the suitability of the proposed methods to practical systems, second plant ofNeyveli Thennal Power Station (NTPS IT) is taken for case study. The perfonnance of the Reinforcement Learning solution is found to be better than the other existing methods, which provide the promising step towards RL based control schemes for practical power industry.Reinforcement Learning is applied to solve the scheduling problems in the power industry and found to give satisfactory perfonnance. Proposed solution provides a scope for getting more profit as the economic schedule is obtained instantaneously. Since Reinforcement Learning method can take the stochastic cost data obtained time to time from a plant, it gives an implementable method. As a further step, with suitable methods to interface with on line data, economic scheduling can be achieved instantaneously in a generation control center. Also power scheduling of systems with different sources such as hydro, thermal etc. can be looked into and Reinforcement Learning solutions can be achieved.
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
Data centre is a centralized repository,either physical or virtual,for the storage,management and dissemination of data and information organized around a particular body and nerve centre of the present IT revolution.Data centre are expected to serve uniinterruptedly round the year enabling them to perform their functions,it consumes enormous energy in the present scenario.Tremendous growth in the demand from IT Industry made it customary to develop newer technologies for the better operation of data centre.Energy conservation activities in data centre mainly concentrate on the air conditioning system since it is the major mechanical sub-system which consumes considerable share of the total power consumption of the data centre.The data centre energy matrix is best represented by power utilization efficiency(PUE),which is defined as the ratio of the total facility power to the IT equipment power.Its value will be greater than one and a large value of PUE indicates that the sub-systems draw more power from the facility and the performance of the data will be poor from the stand point of energy conservation. PUE values of 1.4 to 1.6 are acievable by proper design and management techniques.Optimizing the air conditioning systems brings enormous opportunity in bringing down the PUE value.The air conditioning system can be optimized by two approaches namely,thermal management and air flow management.thermal management systems are now introduced by some companies but they are highly sophisticated and costly and do not catch much attention in the thumb rules.
Resumo:
This work identifies the importance of plenum pressure on the performance of the data centre. The present methodology followed in the industry considers the pressure drop across the tile as a dependant variable, but it is shown in this work that this is the only one independent variable that is responsible for the entire flow dynamics in the data centre, and any design or assessment procedure must consider the pressure difference across the tile as the primary independent variable. This concept is further explained by the studies on the effect of dampers on the flow characteristics. The dampers have found to introduce an additional pressure drop there by reducing the effective pressure drop across the tile. The effect of damper is to change the flow both in quantitative and qualitative aspects. But the effect of damper on the flow in the quantitative aspect is only considered while using the damper as an aid for capacity control. Results from the present study suggest that the use of dampers must be avoided in data centre and well designed tiles which give required flow rates must be used in the appropriate locations. In the present study the effect of hot air recirculation is studied with suitable assumptions. It identifies that, the pressure drop across the tile is a dominant parameter which governs the recirculation. The rack suction pressure of the hardware along with the pressure drop across the tile determines the point of recirculation in the cold aisle. The positioning of hardware in the racks play an important role in controlling the recirculation point. The present study is thus helpful in the design of data centre air flow, based on the theory of jets. The air flow can be modelled both quantitatively and qualitatively based on the results.
Resumo:
In the present study the effect of hot air recirculation is studied with suitable assumptions. It identifies that, the pressure drop across the tile is a dominant parameter which governs the recirculation. The rack suction pressure of the hardware along with the pressure drop across the tile determines the point of recirculation in the cold aisle. The positioning of hardware in the racks play an important role in controlling the recirculation point. The present study is thus helpful in the design of data centre air flow, based on the theory of jets. The air flow can be modelled both quantitatively and qualitatively based on the results