19 resultados para correlation-based feature selection
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
The lack of research of private real estate is a well-known problem. Earlier studies have mostly concentrated on the USA or the UK. Therefore, this master thesis offers more information about the performance and risk associated with private real estate investments in Nordic countries, but especially in Finland. The structure of this master thesis is divided into two independent sections based on the research questions. In first section, database analysis is performed to assess risk-return ratio of direct real estate investment for Nordic countries. Risk-return ratios are also assessed for different property sectors and economic regions. Finally, review of diversification strategies based on property sectors and economic regions is performed. However, standard deviation itself is not usually sufficient method to evaluate riskiness of private real estate. There is demand for more explicit assessment of property risk. One solution is property risk scoring. In second section risk scorecard based tool is built to make different real estate comparable in terms of risk. In order to do this, nine real estate professionals were interviewed to enhance the structure of theory-based risk scorecard and to assess weights for different risk factors.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
Feature extraction is the part of pattern recognition, where the sensor data is transformed into a more suitable form for the machine to interpret. The purpose of this step is also to reduce the amount of information passed to the next stages of the system, and to preserve the essential information in the view of discriminating the data into different classes. For instance, in the case of image analysis the actual image intensities are vulnerable to various environmental effects, such as lighting changes and the feature extraction can be used as means for detecting features, which are invariant to certain types of illumination changes. Finally, classification tries to make decisions based on the previously transformed data. The main focus of this thesis is on developing new methods for the embedded feature extraction based on local non-parametric image descriptors. Also, feature analysis is carried out for the selected image features. Low-level Local Binary Pattern (LBP) based features are in a main role in the analysis. In the embedded domain, the pattern recognition system must usually meet strict performance constraints, such as high speed, compact size and low power consumption. The characteristics of the final system can be seen as a trade-off between these metrics, which is largely affected by the decisions made during the implementation phase. The implementation alternatives of the LBP based feature extraction are explored in the embedded domain in the context of focal-plane vision processors. In particular, the thesis demonstrates the LBP extraction with MIPA4k massively parallel focal-plane processor IC. Also higher level processing is incorporated to this framework, by means of a framework for implementing a single chip face recognition system. Furthermore, a new method for determining optical flow based on LBPs, designed in particular to the embedded domain is presented. Inspired by some of the principles observed through the feature analysis of the Local Binary Patterns, an extension to the well known non-parametric rank transform is proposed, and its performance is evaluated in face recognition experiments with a standard dataset. Finally, an a priori model where the LBPs are seen as combinations of n-tuples is also presented
Resumo:
Demand for the use of energy systems, entailing high efficiency as well as availability to harness renewable energy sources, is a key issue in order to tackling the threat of global warming and saving natural resources. Organic Rankine cycle (ORC) technology has been identified as one of the most promising technologies in recovering low-grade heat sources and in harnessing renewable energy sources that cannot be efficiently utilized by means of more conventional power systems. The ORC is based on the working principle of Rankine process, but an organic working fluid is adopted in the cycle instead of steam. This thesis presents numerical and experimental results of the study on the design of small-scale ORCs. Two main applications were selected for the thesis: waste heat re- covery from small-scale diesel engines concentrating on the utilization of the exhaust gas heat and waste heat recovery in large industrial-scale engine power plants considering the utilization of both the high and low temperature heat sources. The main objective of this work was to identify suitable working fluid candidates and to study the process and turbine design methods that can be applied when power plants based on the use of non-conventional working fluids are considered. The computational work included the use of thermodynamic analysis methods and turbine design methods that were based on the use of highly accurate fluid properties. In addition, the design and loss mechanisms in supersonic ORC turbines were studied by means of computational fluid dynamics. The results indicated that the design of ORC is highly influenced by the selection of the working fluid and cycle operational conditions. The results for the turbine designs in- dicated that the working fluid selection should not be based only on the thermodynamic analysis, but requires also considerations on the turbine design. The turbines tend to be fast rotating, entailing small blade heights at the turbine rotor inlet and highly supersonic flow in the turbine flow passages, especially when power systems with low power outputs are designed. The results indicated that the ORC is a potential solution in utilizing waste heat streams both at high and low temperatures and both in micro and larger scale appli- cations.
Resumo:
The estimating of the relative orientation and position of a camera is one of the integral topics in the field of computer vision. The accuracy of a certain Finnish technology company’s traffic sign inventory and localization process can be improved by utilizing the aforementioned concept. The company’s localization process uses video data produced by a vehicle installed camera. The accuracy of estimated traffic sign locations depends on the relative orientation between the camera and the vehicle. This thesis proposes a computer vision based software solution which can estimate a camera’s orientation relative to the movement direction of the vehicle by utilizing video data. The task was solved by using feature-based methods and open source software. When using simulated data sets, the camera orientation estimates had an absolute error of 0.31 degrees on average. The software solution can be integrated to be a part of the traffic sign localization pipeline of the company in question.
Resumo:
Localization, which is the ability of a mobile robot to estimate its position within its environment, is a key capability for autonomous operation of any mobile robot. This thesis presents a system for indoor coarse and global localization of a mobile robot based on visual information. The system is based on image matching and uses SIFT features as natural landmarks. Features extracted from training images arestored in a database for use in localization later. During localization an image of the scene is captured using the on-board camera of the robot, features are extracted from the image and the best match is searched from the database. Feature matching is done using the k-d tree algorithm. Experimental results showed that localization accuracy increases with the number of training features used in the training database, while, on the other hand, increasing number of features tended to have a negative impact on the computational time. For some parts of the environment the error rate was relatively high due to a strong correlation of features taken from those places across the environment.
Resumo:
Epäherkkien räjähteiden teknologian käyttö mahdollistaa räjähdeturvallisuuden ja sa-malla sotilaallisen suorituskyvyn parantumisen. Suomen Puolustusvoimat päätti IM-teknologian käyttöönotosta vuonna 2004, mikä on edellyttänyt räjähdevalmistajilta tuotteidensa kehitystoimenpiteitä. Yksi osa-alue erään räjähdevalmistajan ajopanostuotteen kehitystyöstä on Exel Oyj:n valmistama panospakkaus, joka pyritään kehittämään uudistuneiden vaatimusten mukaiseksi. Tässä diplomityössä selvitettiin IM-tuotteen vaatimukset ja laadittiin panospakkauksen vaatimusprofiili, jonka perusteella tehtiinkirjallisen selvitys ballistisista materiaaleista. Panospakkauksen rakenneratkaisut perustuivat materiaalinvalintaan, joka toteutettiin ominaisarvovertailulla.Rakenneratkaisuja testattiin sekä ballistisesti että mekaanisilla testausmenetelmillä. Lisäksi testausmenetelmien välille tehtiin korrelaatiovertailuja, joidentulokset ovat lupaavia. Projektin lopputuloksena saatiin kehitettyä Exel Oyj:lle panospakkausratkaisu, jolla voidaan edistää suojattavan tuotteen IM-vaatimusten täyttymistä. Rakenteelle toteutetun lujuusanalyysin perusteella pakkausrakennetta voidaan pitää käyttökohteeseen soveltuvana.
Resumo:
Tämän diplomityön tavoitteena on kartoittaa suomalaisen sahakonevalmistaja Veisto Oy:n kannalta lähitulevaisuuden merkittävimmät markkina-alueet, joiden sahateollisuuteen tehdään lähivuosina eniten korkean teknologian investointeja. Markkina-alueiden valinnassa sovelletaan sekä numeerisiin tilastoihin että asiantuntijahaastatteluihin pohjautuvia ranking-menetelmiä. Työn ensimmäinen osa käsittelee kansainvälisten teollisten markkinoiden ominaispiirteitä ja niiden analysointia. Pääpaino on kuitenkin screening-menetelmillä, markkina-alueiden vertailumenetelmilläja päätöksenteon työkaluilla. Työn toisessa osassa keskitytään markkina-alueiden screeningiin, analysointiin ja maiden eri ominaisuuksien vertailuun. Päätöksentekomatriiseja hyödyntäen valitaan Veisto Oy:lle kolme tällä hetkellä houkuttelevinta markkina-aluetta, joita ovat Venäjä, USA:n kaakkoisosan Southern Yellow Pine -alue sekä Etelä-Amerikan suurimmat sahaajamaat (Brasilia, Argentiina ja Chile) yhtenä alueena. Valituilla alueilla on Veiston kannalta omat haasteensa: USA:ssa vahvat kotimaiset kilpailijat ja uusien referenssien saaminen, Venäjällä investointien epävarmuus ja markkina-alueen laajuuden tuoma monimuotoisuus sekä Etelä-Amerikassa vahvat ruotsalaiset kilpailijat sekä etenkin Brasilian osalta tuntuvat suojatullit.
Resumo:
Paperin pinnan karheus on yksi paperin laatukriteereistä. Sitä mitataan fyysisestipaperin pintaa mittaavien laitteiden ja optisten laitteiden avulla. Mittaukset vaativat laboratorioolosuhteita, mutta nopeammille, suoraan linjalla tapahtuville mittauksilla olisi tarvetta paperiteollisuudessa. Paperin pinnan karheus voidaan ilmaista yhtenä näytteelle kohdistuvana karheusarvona. Tässä työssä näyte on jaettu merkitseviin alueisiin, ja jokaiselle alueelle on laskettu erillinen karheusarvo. Karheuden mittaukseen on käytetty useita menetelmiä. Yleisesti hyväksyttyä tilastollista menetelmää on käytetty tässä työssä etäisyysmuunnoksen lisäksi. Paperin pinnan karheudenmittauksessa on ollut tarvetta jakaa analysoitava näyte karheuden perusteella alueisiin. Aluejaon avulla voidaan rajata näytteestä selvästi karheampana esiintyvät alueet. Etäisyysmuunnos tuottaa alueita, joita on analysoitu. Näistä alueista on muodostettu yhtenäisiä alueita erilaisilla segmentointimenetelmillä. PNN -menetelmään (Pairwise Nearest Neighbor) ja naapurialueiden yhdistämiseen perustuvia algoritmeja on käytetty.Alueiden jakamiseen ja yhdistämiseen perustuvaa lähestymistapaa on myös tarkasteltu. Segmentoitujen kuvien validointi on yleensä tapahtunut ihmisen tarkastelemana. Tämän työn lähestymistapa on verrata yleisesti hyväksyttyä tilastollista menetelmää segmentoinnin tuloksiin. Korkea korrelaatio näiden tulosten välillä osoittaa onnistunutta segmentointia. Eri kokeiden tuloksia on verrattu keskenään hypoteesin testauksella. Työssä on analysoitu kahta näytesarjaa, joidenmittaukset on suoritettu OptiTopolla ja profilometrillä. Etäisyysmuunnoksen aloitusparametrit, joita muutettiin kokeiden aikana, olivat aloituspisteiden määrä ja sijainti. Samat parametrimuutokset tehtiin kaikille algoritmeille, joita käytettiin alueiden yhdistämiseen. Etäisyysmuunnoksen jälkeen korrelaatio oli voimakkaampaa profilometrillä mitatuille näytteille kuin OptiTopolla mitatuille näytteille. Segmentoiduilla OptiTopo -näytteillä korrelaatio parantui voimakkaammin kuin profilometrinäytteillä. PNN -menetelmän tuottamilla tuloksilla korrelaatio oli paras.
Resumo:
Productivity and profitability are important concepts and measures describing the performance and success of a firm. We know that increase in productivity decreases the costs per unit produced and leads to better profitability. This common knowledge is not, however, enough in the modern business environment. Productivity improvement is one means among others for increasing the profitability of actions. There are many means to increase productivity. The use of these means presupposes operative decisions and these decisions presuppose informationabout the effects of these means. Productivity improvement actions are in general made at floor level with machines, cells, activities and human beings. Profitability is most meaningful at the level of the whole firm. It has been very difficult or even impossible to analyze closely enough the economical aspects of thechanges at floor level with the traditional costing systems. New ideas in accounting have only recently brought in elements which make it possible to considerthese phenomena where they actually happen. The aim of this study is to supportthe selection of objects to productivity improvement, and to develop a method to analyze the effects of the productivity change in an activity on the profitability of a firm. A framework for systemizing the economical management of productivity improvement is developed in this study. This framework is a systematical way with two stages to analyze the effects of productivity improvement actions inan activity on the profitability of a firm. At the first stage of the framework, a simple selection method which is based on the worth, possibility and the necessity of the improvement actions in each activity is presented. This method is called Urgency Analysis. In the second stage it is analyzed how much a certain change of productivity in an activity affects the profitability of a firm. A theoretical calculation model with which it is possible to analyze the effects of a productivity improvement in monetary values is presented. On the basis of this theoretical model a tool is made for the analysis at the firm level. The usefulness of this framework was empirically tested with the data of the profit center of one medium size Finnish firm which operates in metal industry. It is expressedthat the framework provides valuable information about the economical effects of productivity improvement for supporting the management in their decision making.
Resumo:
The purpose of this thesis is to analyse activity-based costing (ABC) and possible modified versions ofit in engineering design context. The design engineers need cost information attheir decision-making level and the cost information should also have a strong future orientation. These demands are high because traditional management accounting has concentrated on the direct actual costs of the products. However, cost accounting has progressed as ABC was introduced late 1980s and adopted widely bycompanies in the 1990s. The ABC has been a success, but it has gained also criticism. In some cases the ambitious ABC systems have become too complex to build,use and update. This study can be called an action-oriented case study with some normative features. In this thesis theoretical concepts are assessed and allowed to unfold gradually through interaction with data from three cases. The theoretical starting points are ABC and theory of engineering design process (chapter2). Concepts and research results from these theoretical approaches are summarized in two hypotheses (chapter 2.3). The hypotheses are analysed with two cases (chapter 3). After the two case analyses, the ABC part is extended to cover alsoother modern cost accounting methods, e.g. process costing and feature costing (chapter 4.1). The ideas from this second theoretical part are operationalized with the third case (chapter 4.2). The knowledge from the theory and three cases is summarized in the created framework (chapter 4.3). With the created frameworkit is possible to analyse ABC and its modifications in the engineering design context. The framework collects the factors that guide the choice of the costing method to be used in engineering design. It also illuminates the contents of various ABC-related costing methods. However, the framework needs to be further tested. On the basis of the three cases it can be said that ABC should be used cautiously when formulating cost information for engineering design. It is suitable when the manufacturing can be considered simple, or when the design engineers are not cost conscious, and in the beginning of the design process when doing adaptive or variant design. If the design engineers need cost information for the embodiment or detailed design, or if manufacturing can be considered complex, or when design engineers are cost conscious, the ABC has to be always evaluated critically.