5 resultados para optimal feature selection
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
The purpose of this master’s thesis was to develop a method to be used in the selection of an optimal energy system for buildings and districts. The term optimal energy system was defined as the energy system which best fulfils the requirements of the stakeholder on whose preferences the energy systems are evaluated. The most influential stakeholder in the process of selecting an energy system was considered to be the district developer. The selection method consisted of several steps: Definition of the district, calculating the energy consumption of the district and buildings within the district, defining suitable energy system alternatives for the district, definition of the comparing criteria, calculating the parameters of the comparing criteria for each energy system alternative and finally using a multi-criteria decision method to rank the alternatives. For the purposes of the selection method, the factors affecting the energy consumption of buildings and districts and technologies enabling the use of renewable energy were reviewed. The key element of the selection method was a multi-criteria decision making method, PROMETHEE II. In order to compare the energy system alternatives with the developed method, the comparing criteria were defined in the study. The criteria included costs, environmental impacts and technological and technical characteristics of the energy systems. Each criterion was given an importance, based on a questionnaire which was sent for the steering groups of two district development projects. The selection method was applied in two case study analyses. The results indicate that the selection method provides a viable and easy way to provide the decision makers alternatives and recommendations regarding the selection of an energy system. Since the comparison is carried out by changing the alternatives into numeric form, the presented selection method was found to exclude any unjustified preferences over certain energy systems alternatives which would affect the selection.