6 resultados para Feature selection algorithm
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
Human beings have always strived to preserve their memories and spread their ideas. In the beginning this was always done through human interpretations, such as telling stories and creating sculptures. Later, technological progress made it possible to create a recording of a phenomenon; first as an analogue recording onto a physical object, and later digitally, as a sequence of bits to be interpreted by a computer. By the end of the 20th century technological advances had made it feasible to distribute media content over a computer network instead of on physical objects, thus enabling the concept of digital media distribution. Many digital media distribution systems already exist, and their continued, and in many cases increasing, usage is an indicator for the high interest in their future enhancements and enriching. By looking at these digital media distribution systems, we have identified three main areas of possible improvement: network structure and coordination, transport of content over the network, and the encoding used for the content. In this thesis, our aim is to show that improvements in performance, efficiency and availability can be done in conjunction with improvements in software quality and reliability through the use of formal methods: mathematical approaches to reasoning about software so that we can prove its correctness, together with the desirable properties. We envision a complete media distribution system based on a distributed architecture, such as peer-to-peer networking, in which different parts of the system have been formally modelled and verified. Starting with the network itself, we show how it can be formally constructed and modularised in the Event-B formalism, such that we can separate the modelling of one node from the modelling of the network itself. We also show how the piece selection algorithm in the BitTorrent peer-to-peer transfer protocol can be adapted for on-demand media streaming, and how this can be modelled in Event-B. Furthermore, we show how modelling one peer in Event-B can give results similar to simulating an entire network of peers. Going further, we introduce a formal specification language for content transfer algorithms, and show that having such a language can make these algorithms easier to understand. We also show how generating Event-B code from this language can result in less complexity compared to creating the models from written specifications. We also consider the decoding part of a media distribution system by showing how video decoding can be done in parallel. This is based on formally defined dependencies between frames and blocks in a video sequence; we have shown that also this step can be performed in a way that is mathematically proven correct. Our modelling and proving in this thesis is, in its majority, tool-based. This provides a demonstration of the advance of formal methods as well as their increased reliability, and thus, advocates for their more wide-spread usage in the future.
Resumo:
The issue of selecting an appropriate healthcare information system is a very essential one. If implemented healthcare information system doesn’t fit particular healthcare institution, for example there are unnecessary functions; healthcare institution wastes its resources and its efficiency decreases. The purpose of this research is to develop a healthcare information system selection model to assist the decision-making process of choosing healthcare information system. Appropriate healthcare information system helps healthcare institutions to become more effective and efficient and keep up with the times. The research is based on comparison analysis of 50 healthcare information systems and 6 interviews with experts from St-Petersburg healthcare institutions that already have experience in healthcare information system utilization. 13 characteristics of healthcare information systems: 5 key and 7 additional features are identified and considered in the selection model development. Variables are used in the selection model in order to narrow the decision algorithm and to avoid duplication of brunches. The questions in the healthcare information systems selection model are designed to be easy-to-understand for common a decision-maker in healthcare institution without permanent establishment.