44 resultados para Feature selection
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
Selostus: Alkionsiirtojalostusohjelma "ASMO", sen tavoitteet ja yhteenveto alkuvalinnan tuloksista
Resumo:
Summary
Resumo:
Tämän diplomityön tavoitteena on kartoittaa suomalaisen sahakonevalmistaja Veisto Oy:n kannalta lähitulevaisuuden merkittävimmät markkina-alueet, joiden sahateollisuuteen tehdään lähivuosina eniten korkean teknologian investointeja. Markkina-alueiden valinnassa sovelletaan sekä numeerisiin tilastoihin että asiantuntijahaastatteluihin pohjautuvia ranking-menetelmiä. Työn ensimmäinen osa käsittelee kansainvälisten teollisten markkinoiden ominaispiirteitä ja niiden analysointia. Pääpaino on kuitenkin screening-menetelmillä, markkina-alueiden vertailumenetelmilläja päätöksenteon työkaluilla. Työn toisessa osassa keskitytään markkina-alueiden screeningiin, analysointiin ja maiden eri ominaisuuksien vertailuun. Päätöksentekomatriiseja hyödyntäen valitaan Veisto Oy:lle kolme tällä hetkellä houkuttelevinta markkina-aluetta, joita ovat Venäjä, USA:n kaakkoisosan Southern Yellow Pine -alue sekä Etelä-Amerikan suurimmat sahaajamaat (Brasilia, Argentiina ja Chile) yhtenä alueena. Valituilla alueilla on Veiston kannalta omat haasteensa: USA:ssa vahvat kotimaiset kilpailijat ja uusien referenssien saaminen, Venäjällä investointien epävarmuus ja markkina-alueen laajuuden tuoma monimuotoisuus sekä Etelä-Amerikassa vahvat ruotsalaiset kilpailijat sekä etenkin Brasilian osalta tuntuvat suojatullit.
Resumo:
Tutkielmantavoitteena oli luoda ohjeistus toimittajan valinnasta ja suorituskyvyn arvioinnista case - yrityksen, Exel Oyj:n, käyttöön. Ohjeistuksen tarkoituksena oli ollalähtökohtana toimittajan valinta- ja suoristuskyvyn arviointiprosessien kehittämisessä. Tutkielma keskittyy esittelemään toimittajan valintakriteereitä ja toimittajan suorituskyvyn arviointikriteereitä. Kriteerit valittiin ja analysoitiin teorian ja empirian avulla ja kriteereistä tehtiin selkeät listaukset. Näitä listoja käytettiin avuksi pohdittaessa uusia valintakriteereitä ja suorituskyvyn arviointikriteereitä, joita case -yritys voi jatkossa käyttää. Tutkielmassa käytiin läpi myös toimittajan valintaprosessi jaapuvälineitä ja mittareita toimittajan arviointiin liittyen. Empiirisen aineiston keruu toteutettiin haastattelemalla hankintapäällikköä sekä keräämällä tietoavuosikertomuksesta ja yrityksen internet sivuilta. Tutkielman tuloksena saatiinlistauksia kriteereistä, joita yritys voi hyödyntää jatkossa sekä listaukset kriteereistä, jotka valittiin alustavasti yrityksen käyttöön.
Resumo:
The objective of the dissertation is to increase understanding and knowledge in the field where group decision support system (GDSS) and technology selection research overlap in the strategic sense. The purpose is to develop pragmatic, unique and competent management practices and processes for strategic technology assessment and selection from the whole company's point of view. The combination of the GDSS and technology selection is approached from the points of view of the core competence concept, the lead user -method, and different technology types. In this research the aim is to find out how the GDSS contributes to the technology selection process, what aspects should be considered when selecting technologies to be developed or acquired, and what advantages and restrictions the GDSS has in the selection processes. These research objectives are discussed on the basis of experiences and findings in real life selection meetings. The research has been mainly carried outwith constructive, case study research methods. The study contributes novel ideas to the present knowledge and prior literature on the GDSS and technology selection arena. Academic and pragmatic research has been conducted in four areas: 1) the potential benefits of the group support system with the lead user -method,where the need assessment process is positioned as information gathering for the selection of wireless technology development projects; 2) integrated technology selection and core competencies management processes both in theory and in practice; 3) potential benefits of the group decision support system in the technology selection processes of different technology types; and 4) linkages between technology selection and R&D project selection in innovative product development networks. New type of knowledge and understanding has been created on the practical utilization of the GDSS in technology selection decisions. The study demonstrates that technology selection requires close cooperation between differentdepartments, functions, and strategic business units in order to gather the best knowledge for the decision making. The GDSS is proved to be an effective way to promote communication and co-operation between the selectors. The constructs developed in this study have been tested in many industry fields, for example in information and communication, forest, telecommunication, metal, software, and miscellaneous industries, as well as in non-profit organizations. The pragmatic results in these organizations are some of the most relevant proofs that confirm the scientific contribution of the study, according to the principles of the constructive research approach.
Resumo:
This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
Tämän tutkielman tavoitteena oli määrittää uuden markkinan valinnan perusteita teolliselle tuotteelle. Tutkielma keskittyi jo tunnettuihin kansainvälisen markkinavalinnan lähestymistapoihin ja pyrki soveltamaan yhtä menetelmää käytäntöön tutkielman empiria osassa case-tutkimuksen avulla. Tutkimusote oli tutkiva, eksploratiivinen ja perustui sekundääri analyysiin. Käytetyt tiedon lähteet olivat suureksi osin sekundäärisiä tuottaen kvalitatiivista tietoa. Kuitenkin haastatteluita suoritettiin myös. Kattava kirjallisuus katsaus tunnetuista teoreettisista lähestymistavoista kansainväliseen markkinavalintaan oli osa tutkielmaa. Kolme tärkeintä lähestymistapaa esiteltiin tarkemmin. Yksi lähestymistavoista, ei-järjestelmällinen, muodosti viitekehyksen tutkielman empiria-osalle. Empiria pyrki soveltamaan yhtä ei-järjestelmällisen lähestymistavan malleista kansainvälisessä paperiteollisuudessa. Tarkoituksena oli tunnistaa kaikkein houkuttelevimmat maat mahdollisille markkinointitoimenpiteille tuotteen yhdellä loppukäyttöalueella. Tutkielmassa päädyttiin käyttämään ilmastollisia olosuhteita, siipikarjan päälukua sekä siipikarjan kasvuprosenttia suodattimina pyrittäessä vähentämään mahdollisten maiden lukumäärää. Tutkielman empiria-osa kärsi selkeästi relevantin tiedon puutteesta. Siten myös tutkielman reliabiliteetti ja validiteetti voidaan jossain määrin kyseenalaistaa.