16 resultados para classification and regression tree

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main objective of the study is to form a framework that provides tools to recognise and classify items whose demand is not smooth but varies highly on size and/or frequency. The framework will then be combined with two other classification methods in order to form a three-dimensional classification model. Forecasting and inventory control of these abnormal demand items is difficult. Therefore another object of this study is to find out which statistical forecasting method is most suitable for forecasting of abnormal demand items. The accuracy of different methods is measured by comparing the forecast to the actual demand. Moreover, the study also aims at finding proper alternatives to the inventory control of abnormal demand items. The study is quantitative and the methodology is a case study. The research methods consist of theory, numerical data, current state analysis and testing of the framework in case company. The results of the study show that the framework makes it possible to recognise and classify the abnormal demand items. It is also noticed that the inventory performance of abnormal demand items differs significantly from the performance of smoothly demanded items. This makes the recognition of abnormal demand items very important.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-­effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Perinteisesti ajoneuvojen markkinointikampanjoissa kohderyhmät muodostetaan yksinkertaisella kriteeristöllä koskien henkilön tai hänen ajoneuvonsa ominaisuuksia. Ennustavan analytiikan avulla voidaan tuottaa kohderyhmänmuodostukseen teknisesti kompleksisia mutta kuitenkin helppokäyttöisiä menetelmiä. Tässä työssä on sovellettu luokittelu- ja regressiomenetelmiä uuden auton ostajien joukkoon. Tämän työn menetelmiksi on rajattu tukivektorikone sekä Coxin regressiomalli. Coxin regression avulla on tutkittu elinaika-analyysien soveltuvuutta ostotapahtuman tapahtumahetken mallintamiseen. Luokittelu tukivektorikonetta käyttäen onnistuu tehtävässään noin 72% tapauksissa. Tukivektoriregressiolla mallinnetun hankintahetken virheen keskiarvo on noin neljä kuukautta. Työn tulosten perusteella myös elinaika-analyysin käyttö ostotapahtuman tapahtumahetken mallintamiseen on menetelmänä käyttökelpoinen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tropical forests are sources of many ecosystem services, but these forests are vanishing rapidly. The situation is severe in Sub-Saharan Africa and especially in Tanzania. The causes of change are multidimensional and strongly interdependent, and only understanding them comprehensively helps to change the ongoing unsustainable trends of forest decline. Ongoing forest changes, their spatiality and connection to humans and environment can be studied with the methods of Land Change Science. The knowledge produced with these methods helps to make arguments about the actors, actions and causes that are behind the forest decline. In this study of Unguja Island in Zanzibar the focus is in the current forest cover and its changes between 1996 and 2009. The cover and changes are measured with often used remote sensing methods of automated land cover classification and post-classification comparison from medium resolution satellite images. Kernel Density Estimation is used to determine the clusters of change, sub-area –analysis provides information about the differences between regions, while distance and regression analyses connect changes to environmental factors. These analyses do not only explain the happened changes, but also allow building quantitative and spatial future scenarios. Similar study has not been made for Unguja and therefore it provides new information, which is beneficial for the whole society. The results show that 572 km2 of Unguja is still forested, but 0,82–1,19% of these forests are disappearing annually. Besides deforestation also vertical degradation and spatial changes are significant problems. Deforestation is most severe in the communal indigenous forests, but also agroforests are decreasing. Spatially deforestation concentrates to the areas close to the coastline, population and Zanzibar Town. Biophysical factors on the other hand do not seem to influence the ongoing deforestation process. If the current trend continues there should be approximately 485 km2 of forests remaining in 2025. Solutions to these deforestation problems should be looked from sustainable land use management, surveying and protection of the forests in risk areas and spatially targeted self-sustainable tree planting schemes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Female sexual dysfunctions, including desire, arousal, orgasm and pain problems, have been shown to be highly prevalent among women around the world. The etiology of these dysfunctions is unclear but associations with health, age, psychological problems, and relationship factors have been identified. Genetic effects explain individual variation in orgasm function to some extent but until now quantitative behavior genetic analyses have not been applied to other sexual functions. In addition, behavior genetics can be applied to exploring the cause of any observed comorbidity between the dysfunctions. Discovering more about the etiology of the dysfunctions may further improve the classification systems which are currently under intense debate. The aims of the present thesis were to evaluate the psychometric properties of a Finnish-language version of a commonly used questionnaire for measuring female sexual function, the Female Sexual Function Index (FSFI), in order to investigate prevalence, comorbidity, and classification, and to explore the balance of genetic and environmental factors in the etiology as well as the associations of a number of biopsychosocial factors with female sexual functions. Female sexual functions were studied through survey methods in a population based sample of Finnish twins and their female siblings. There were two waves of data collection. The first data collection targeted 5,000 female twins aged 33–43 years and the second 7,680 female twins aged 18–33 and their over 18–year-old female siblings (n = 3,983). There was no overlap between the data collections. The combined overall response rate for both data collections was 53% (n = 8,868), with a better response rate in the second (57%) compared to the first (45%). In order to measure female sexual function, the FSFI was used. It includes 19 items which measure female sexual function during the previous four weeks in six subdomains; desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain. In line with earlier research in clinical populations, a six factor solution of the Finnish-language version of the FSFI received supported. The internal consistencies of the scales were good to excellent. Some questions about how to avoid overestimating the prevalence of extreme dysfunctions due to women being allocated the score of zero if they had had no sexual activity during the preceding four weeks were raised. The prevalence of female sexual dysfunctions per se ranged from 11% for lubrication dysfunction to 55% for desire dysfunction. The prevalence rates for sexual dysfunction with concomitant sexual distress, in other words, sexual disorders were notably lower ranging from 7% for lubrication disorder to 23% for desire disorder. The comorbidity between the dysfunctions was substantial most notably between arousal and lubrication dysfunction even if these two dysfunctions showed distinct patterns of associations with the other dysfunctions. Genetic influences on individual variation in the six subdomains of FSFI were modest but significant ranging from 3–11% for additive genetic effects and 5–18% for nonadditive genetic effects. The rest of the variation in sexual functions was explained by nonshared environmental influences. A correlated factor model, including additive and nonadditive genetic effects and nonshared environmental effects had the best fit. All in all, every correlation between the genetic factors was significant except between lubrication and pain. All correlations between the nonshared environment factors were significant showing that there is a substantial overlap in genetic and nonshared environmental influences between the dysfunctions. In general, psychological problems, poor satisfaction with the relationship, sexual distress, and poor partner compatibility were associated with more sexual dysfunctions. Age was confounded with relationship length but had over and above relationship length a negative effect on desire and sexual satisfaction and a positive effect on orgasm and pain functions. Alcohol consumption in general was associated with better desire, arousal, lubrication, and orgasm function. Women pregnant with their first child had fewer pain problems than nulliparous nonpregnant women. Multiparous pregnant women had more orgasm problems compared to multiparous nonpregnant women. Having children was associated with less orgasm and pain problems. The conclusions were that desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain are separate entities that have distinct associations with a number of different biopsychosocial factors. However, there is also considerable comorbidity between the dysfunctions which are explained by overlap in additive genetic, nonadditive genetic and nonshared environmental influences. Sexual dysfunctions are highly prevalent and are not always associated with sexual distress and this relationship might be moderated by a good relationship and compatibility with partner. Regarding classification, the results supports separate diagnoses for subjective arousal and genital arousal as well as the inclusion of pain under sexual dysfunctions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mergers and acquisitions (M&A) have played very important role in restructuring the pulp and paper industry (PPI). The poor performance and fragmented nature of the industry, overcapacity problems, and globalisation have driven companies to consolidate. The objective of this thesis was to examine how PPI acquirers’ have performed subsequent M&As and whether the deal characteristics have had any impact on performance. Based on the results it seems that PPI companies have not been able to enhance their performance in the long run after M&As although the per-formance of acquiring firms has remained above the industry median, and deal characteristics or the amount of premiums paid do not seem to have had any effect. The statistical significance of the results was tested with change model and regression analysis. Performance was assessed with accrual, cash flow, and market based indicators. Results are congruent with behavioural theory: managers and investors seem to be overoptimistic in determining the synergies from M&As.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finland’s rural landscape has gone through remarkable changes from the 1950’s, due to agricultural developments. Changed farming practices have influenced especially traditional landscape management, and modifications in the arable land structure and grasslands transitions are notable. The review of the previous studies reveal the importance of the rural landscape composition and structure to species and landscape diversity, whereas including the relevance in presence of the open ditches, size of the field and meadow patches, topology of the natural and agricultural landscape. This land-change study includes applying remote sensed data from two time series and empirical geospatial analysis in Geographic Information Systems (GIS). The aims of this retrospective research is to detect agricultural landscape use and land cover change (LULCC) dynamics and discuss the consequences of agricultural intensification to landscape structure covering from the aspects of landscape ecology. Measurements of LULC are derived directly from pre-processed aerial images by a variety of analytical procedures, including statistical methods and image interpretation. The methodological challenges are confronted in the process of landscape classification and combining change detection approaches with landscape indices. Particular importance is paid on detecting agricultural landscape features at a small scale, demanding comprehensive understanding of such agroecosystems. Topological properties of the classified arable land and valley are determined in order to provide insight and emphasize the aspect the field edges in the agricultural landscape as important habitat. Change detection dynamics are presented with change matrix and additional calculations of gain, loss, swap, net change, change rate and tendencies are made. Transition’s possibility is computed following Markov’s probability model and presented with matrix, as well. Thesis’s spatial aspect is revealed with illustrative maps providing knowledge of location of the classified landscape categories and location of the dynamics of the changes occurred. It was assured that in Rekijoki valley’s landscape, remarkable changes in landscape has occurred. Landscape diversity has been strongly influenced by modern agricultural landscape change, as NP of open ditches has decreased and the MPS of the arable plot has decreased. Overall change in the diversity of the landscape is determined with the decrease of SHDI. Valley landscape considered as traditional land use area has experienced major transitional changes, as meadows class has lost almost one third of the area due to afforestation. Also, remarkable transitions have occurred from forest to meadow and arable land to built area. Boundaries measurement between modern and traditional landscape has indicated noticeable proportional increase in arable land-forest edge type and decrease in arable land-meadow edge type. Probability calculations predict higher future changes for traditional landscape, but also for arable land turning into built area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis studies the development of service offering model that creates added-value for customers in the field of logistics services. The study focusses on offering classification and structures of model. The purpose of model is to provide value-added solutions for customers and enable superior service experience. The aim of thesis is to define what customers expect from logistics solution provider and what value customers appreciate so greatly that they could invest in value-added services. Value propositions, costs structures of offerings and appropriate pricing methods are studied. First, literature review of creating solution business model and customer value is conducted. Customer value is found out with customer interviews and qualitative empiric data is used. To exploit expertise knowledge of logistics, innovation workshop tool is utilized. Customers and experts are involved in the design process of model. As a result of thesis, three-level value-added service offering model is created based on empiric and theoretical data. Offerings with value propositions are proposed and the level of model reflects the deepness of customer-provider relationship and the amount of added value. Performance efficiency improvements and cost savings create the most added value for customers. Value-based pricing methods, such as performance-based models are suggested to apply. Results indicate the interest of benefitting networks and partnership in field of logistics services. Networks development is proposed to be investigated further.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Customer satisfaction should be the main focus for all of the parts of the business. Usually supply chain behind the business is in a key role when this focus is pursued especially in repair service business. When focusing on the materials that are needed to make repairs to equipment under service contracts, the time aspect of quality is critical. Do late deliveries from supplier have an effect on the service performance of repairs when distribution center of a centralized purchasing unit is acting as a buffer between suppliers and repair service business? And if so, how should the improvement efforts be prioritized? These are the two main questions that this thesis focuses on. Correlation and linear regression was tested between service levels of supplier and distribution center. Percentage of on-time deliveries were compared to outbound delivery service level. It was found that there is statistically significant correlation between inbound and outbound operations success. The other main question of the thesis, improvement prioritization, was answered by creating material availability based supplier classification and additional to that, by developing the decision process for the analysis of most critical suppliers. This was built on a basis of previous supplier and material classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ellagitannins are secondary metabolites that are produced by plants. Among other features, they are assumed to function as plants’ defensive compounds against plant-eating herbivores. This thesis focuses on a theory, which suggests that the biological activity of ellagitannins is based on their tendency to oxidize at the highly alkaline gut conditions of insect herbivores (oxidative activity). To study the biological activities of ellagitannins, a wide variety of structurally different ellagitannins were purified from different plant species by using liquid chromatographic techniques. The structures were characterized with the aid of spectrometric methods. Based on the acquired data, it was also possible to create a scheme, which enables the classification and even identification of ellagitannins from plant extracts without the need to isolate each compound for individual characterization. The biological activities of ellagitannins were determined with methods that are based on the abilities of the compounds to scavenge radicals, chelate iron ions, and on their rate of oxidation at high pH. The results showed that ellagitannins possess oxidative activities both at high and neutral pH, and that their activities depend on structure. The occurrence, distribution and content of ellagitannins in Finnish plant species were also studied. The specific ellagitannin profiles of the studied plant species were found to correlate well with their taxonomic classification.