20 resultados para Recursive Partitioning and Regression Trees (RPART)
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Summary
Resumo:
The aim of this study is to gain a better understanding of the structure and the deformation history of a NW-SE trending regional, crustal-scale shear structure in the Åland archipelago, SW Finland, called the Sottunga-Jurmo shear zone (SJSZ). Approaches involving e.g. structural geology, geochronology, geochemistry and metamorphic petrology were utilised in order to reconstruct the overall deformation history of the study area. The study therefore describes several features of the shear zone including structures, kinematics and lithologies within the study area, the ages of the different deformation phases (ductile to brittle) within the shear zone, as well as some geothermobarometric results. The results indicate that the SJSZ outlines a major crustal discontinuity between the extensively migmatized rocks NE of the shear zone and the unmigmatised, amphibolite facies rocks SW of the zone. The main SJSZ shows overall dextral lateral kinematics with a SW-side up vertical component and deformation partitioning into pure shear and simple shear dominated deformation styles that was intensified toward later stages of the deformation history. The deformation partitioning resulted in complex folding and refolding against the SW margin of the SJSZ, including conical and sheath folds, and in a formation of several minor strike-slip shear zones both parallel and conjugate to the main SJSZ in order to accommodate the regional transpressive stresses. Different deformation phases within the study area were dated by SIMS (zircon U-Pb), ID-TIMS (titanite U-Pb) and 40Ar/39Ar (pseudotachylyte wholerock) methods. The first deformation phase within the ca. 1.88 Ga rocks of the study area is dated at ca. 1.85 Ga, and the shear zone was reactivated twice within the ductile regime (at ca. 1.83 Ga and 1.79 Ga), during which the strain was successively increasingly partitioned into the main SJSZ and the minor shear zones. The age determinations suggest that the orogenic processes within the study area did not occur in a temporal continuum; instead, the metamorphic zircon rims and titanites show distinct, 10-20 Ma long breaks in deformation between phases of active deformation. The results of this study further imply slow cooling of the rocks through 600-700ºC so that at 1.79 Ga, 2 the temperature was still at least 600ºC. The highest recorded metamorphic pressures are 6.4-7.1 kbar. At the late stages or soon after the last ductile phase (ca. 1.79 Ga), relatively high-T mylonites and ultramylonites were formed, witnessing extreme deformation partitioning and high strain rates. After the rocks reached lower amphibolite facies to amphibolite-greenschist facies transitional conditions (ca. 500-550ºC), they cooled rapidly, probably due to crustal uplift and exhumation. The shear zone was reactivated at least once within the semi-brittle to brittle regime between ca. 1.79 Ga and 1.58 Ga, as evidenced by cataclasites and pseudotachylytes. In summary, the results of this study suggest that the Sottunga-Jurmo shear zone (and the South Finland shear zone) defines a major crustal discontinuity, and played a central role in accommodating the regional stresses during and after the Svecofennian orogeny.
Resumo:
Mergers and acquisitions (M&A) have played very important role in restructuring the pulp and paper industry (PPI). The poor performance and fragmented nature of the industry, overcapacity problems, and globalisation have driven companies to consolidate. The objective of this thesis was to examine how PPI acquirers’ have performed subsequent M&As and whether the deal characteristics have had any impact on performance. Based on the results it seems that PPI companies have not been able to enhance their performance in the long run after M&As although the per-formance of acquiring firms has remained above the industry median, and deal characteristics or the amount of premiums paid do not seem to have had any effect. The statistical significance of the results was tested with change model and regression analysis. Performance was assessed with accrual, cash flow, and market based indicators. Results are congruent with behavioural theory: managers and investors seem to be overoptimistic in determining the synergies from M&As.
Resumo:
Drying is a major step in the manufacturing process in pharmaceutical industries, and the selection of dryer and operating conditions are sometimes a bottleneck. In spite of difficulties, the bottlenecks are taken care of with utmost care due to good manufacturing practices (GMP) and industries' image in the global market. The purpose of this work is to research the use of existing knowledge for the selection of dryer and its operating conditions for drying of pharmaceutical materials with the help of methods like case-based reasoning and decision trees to reduce time and expenditure for research. The work consisted of two major parts as follows: Literature survey on the theories of spray dying, case-based reasoning and decision trees; working part includes data acquisition and testing of the models based on existing and upgraded data. Testing resulted in a combination of two models, case-based reasoning and decision trees, leading to more specific results when compared to conventional methods.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
The aim of this study is to gain a better understanding of the structure and the deformation history of a NW-SE trending regional, crustal-scale shear structure in the Åland archipelago, SW Finland, called the Sottunga-Jurmo shear zone (SJSZ). Approaches involving e.g. structural geology, geochronology, geochemistry and metamorphic petrology were utilised in order to reconstruct the overall deformation history of the study area. The study therefore describes several features of the shear zone including structures, kinematics and lithologies within the study area, the ages of the different deformation phases (ductile to brittle) within the shear zone, as well as some geothermobarometric results. The results indicate that the SJSZ outlines a major crustal discontinuity between the extensively migmatized rocks NE of the shear zone and the unmigmatised, amphibolite facies rocks SW of the zone. The main SJSZ shows overall dextral lateral kinematics with a SW-side up vertical component and deformation partitioning into pure shear and simple shear dominated deformation styles that was intensified toward later stages of the deformation history. The deformation partitioning resulted in complex folding and refolding against the SW margin of the SJSZ, including conical and sheath folds, and in a formation of several minor strike-slip shear zones both parallel and conjugate to the main SJSZ in order to accommodate the regional transpressive stresses. Different deformation phases within the study area were dated by SIMS (zircon U-Pb), ID-TIMS (titanite U-Pb) and 40Ar/39Ar (pseudotachylyte wholerock) methods. The first deformation phase within the ca. 1.88 Ga rocks of the study area is dated at ca. 1.85 Ga, and the shear zone was reactivated twice within the ductile regime (at ca. 1.83 Ga and 1.79 Ga), during which the strain was successively increasingly partitioned into the main SJSZ and the minor shear zones. The age determinations suggest that the orogenic processes within the study area did not occur in a temporal continuum; instead, the metamorphic zircon rims and titanites show distinct, 10-20 Ma long breaks in deformation between phases of active deformation. The results of this study further imply slow cooling of the rocks through 600-700ºC so that at 1.79 Ga, 2 the temperature was still at least 600ºC. The highest recorded metamorphic pressures are 6.4-7.1 kbar. At the late stages or soon after the last ductile phase (ca. 1.79 Ga), relatively high-T mylonites and ultramylonites were formed, witnessing extreme deformation partitioning and high strain rates. After the rocks reached lower amphibolite facies to amphibolite-greenschist facies transitional conditions (ca. 500-550ºC), they cooled rapidly, probably due to crustal uplift and exhumation. The shear zone was reactivated at least once within the semi-brittle to brittle regime between ca. 1.79 Ga and 1.58 Ga, as evidenced by cataclasites and pseudotachylytes. In summary, the results of this study suggest that the Sottunga-Jurmo shear zone (and the South Finland shear zone) defines a major crustal discontinuity, and played a central role in accommodating the regional stresses during and after the Svecofennian orogeny.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Tropical forests are sources of many ecosystem services, but these forests are vanishing rapidly. The situation is severe in Sub-Saharan Africa and especially in Tanzania. The causes of change are multidimensional and strongly interdependent, and only understanding them comprehensively helps to change the ongoing unsustainable trends of forest decline. Ongoing forest changes, their spatiality and connection to humans and environment can be studied with the methods of Land Change Science. The knowledge produced with these methods helps to make arguments about the actors, actions and causes that are behind the forest decline. In this study of Unguja Island in Zanzibar the focus is in the current forest cover and its changes between 1996 and 2009. The cover and changes are measured with often used remote sensing methods of automated land cover classification and post-classification comparison from medium resolution satellite images. Kernel Density Estimation is used to determine the clusters of change, sub-area –analysis provides information about the differences between regions, while distance and regression analyses connect changes to environmental factors. These analyses do not only explain the happened changes, but also allow building quantitative and spatial future scenarios. Similar study has not been made for Unguja and therefore it provides new information, which is beneficial for the whole society. The results show that 572 km2 of Unguja is still forested, but 0,82–1,19% of these forests are disappearing annually. Besides deforestation also vertical degradation and spatial changes are significant problems. Deforestation is most severe in the communal indigenous forests, but also agroforests are decreasing. Spatially deforestation concentrates to the areas close to the coastline, population and Zanzibar Town. Biophysical factors on the other hand do not seem to influence the ongoing deforestation process. If the current trend continues there should be approximately 485 km2 of forests remaining in 2025. Solutions to these deforestation problems should be looked from sustainable land use management, surveying and protection of the forests in risk areas and spatially targeted self-sustainable tree planting schemes.
Resumo:
TAVOITTEET: Tämän tutkielman tarkoitus on tarkastella eri toimialojen likviditeettitasoja vuosien 2007 ja 2013 välillä. Se tarkastelee myös kassanhallinnan ja likviditeetin kirjallisuutta, erilaisia likviditeettiä kuvaavia tunnuslukuja sekä asioita, joilla on vaikutusta likviditeettiin. Tämän lisäksi se tutkii informaatio ja kommunikaatio sektoria tarkemmin. DATA: Data on kerätty Orbis tietokannasta. Toimialakohtaiset keskiarvot on laskettu joko kappaleen 2 esittämillä kaavoilla tai noudettu suoraan tietokannasta. Hajonta kuvaajat on tehty Excelillä ja korrelaatio matriisi ja regressioanalyysit SAS EG:llä. TULOKSET: Tämä tutkimus esittää toimialakohtaiset keskiarvot liquidity ratiosta, solvency ratiosta sekä gearingista, kuten monista muista likviditeettiä kuvaavista tai siihen vaikuttavista tunnusluvuista. Tutkimus osoittaa, että keskimäärin likviditeetti ja maksuvalmius ovat säilyneet melko samana, mutta toimialakohtaiset muutokset ovat voimakkaita. IC sektorilla likviditeettiin vaikuttaa katetuotto, työntekijöiden määrä, liikevaihto, taseen määrä sekä maksuaika.
Resumo:
This thesis investigates factors that affect software testing practice. The thesis consists of empirical studies, in which the affecting factors were analyzed and interpreted using quantitative and qualitative methods. First, the Delphi method was used to specify the scope of the thesis. Secondly, for the quantitative analysis 40industry experts from 30 organizational units (OUs) were interviewed. The survey method was used to explore factors that affect software testing practice. Conclusions were derived using correlation and regression analysis. Thirdly, from these 30 OUs, five were further selected for an in-depth case study. The data was collected through 41 semi-structured interviews. The affecting factors and their relationships were interpreted with qualitative analysis using grounded theory as the research method. The practice of software testing was analyzed from the process improvement and knowledge management viewpoints. The qualitative and quantitativeresults were triangulated to increase the validity of the thesis. Results suggested that testing ought to be adjusted according to the business orientation of the OU; the business orientation affects the testing organization and knowledge management strategy, and the business orientation andthe knowledge management strategy affect outsourcing. As a special case, the complex relationship between testing schedules and knowledge transfer is discussed. The results of this thesis can be used in improvingtesting processes and knowledge management in software testing.
Resumo:
Tässä diplomityössä tutkittiin kysynnän ennustamista Vaasan & Vaasan Oy:n tuotteille. Ensin työssä perehdyttiin ennustamiseen ja sen tarjoamiin mahdollisuuksiin yrityksessä. Erityisesti kysynnän ennustamisesta saatavat hyödyt käytiin läpi. Kysynnän ennustamisesta haettiin ratkaisua erityisesti ongelmiin työvuorosuunnittelussa.Työssä perehdyttiin ennustemenetelmiin liittyvään kirjallisuuteen, jonka oppien perusteella tehtiin koe-ennustuksia yrityksen kysynnän historiadatan avulla. Koe-ennustuksia tehtiin kuudelle eri Turun leipomon koe-tuotteelle. Ennustettavana aikavälinä oli kahden viikon päiväkohtainen kysyntä. Tämän aikavälin erityisesti peruskysynnälle etsittiin ennustetarkkuudeltaan parasta kvantitatiivista ennustemenetelmää. Koe-ennustuksia tehtiin liukuvilla keskiarvoilla, klassisella aikasarja-analyysillä, eksponentiaalisen tasoituksen menetelmällä, Holtin lineaarisella eksponenttitasoituksen menetelmällä, Wintersin kausittaisella eksponentiaalisella tasoituksella, autoregressiivisillä malleilla, Box-Jenkinsin menetelmällä ja regressioanalyysillä. Myös neuroverkon opettamista historiadatalla ja käyttämistä ongelman ratkaisun apuna kokeiltiin.Koe-ennustuksien tulosten perusteella ennustemenetelmien toimintaa analysoitiin jatkokehitystä varten. Ennustetarkkuuden lisäksi arvioitiin mallin yksinkertaisuutta, helppokäyttöisyyttä ja sopivuutta yrityksen monien tuotteiden ennustamiseen. Myös kausivaihteluihin, trendeihin ja erikoispäiviin kiinnitettiin huomiota. Ennustetarkkuuden huomattiin parantuvan selvästi peruskysyntää ennustettaessa, jos ensin historiadata esikäsittelemällä puhdistettiin erikoispäivistä ja –viikoista.
Resumo:
Seaports play an important part in the wellbeing of a nation. Many nations are highly dependent on foreign trade and most trade is done using sea vessels. This study is part of a larger research project, where a simulation model is required in order to create further analyses on Finnish macro logistical networks. The objective of this study is to create a system dynamic simulation model, which gives an accurate forecast for the development of demand of Finnish seaports up to 2030. The emphasis on this study is to show how it is possible to create a detailed harbor demand System Dynamic model with the help of statistical methods. The used forecasting methods were ARIMA (autoregressive integrated moving average) and regression models. The created simulation model gives a forecast with confidence intervals and allows studying different scenarios. The building process was found to be a useful one and the built model can be expanded to be more detailed. Required capacity for other parts of the Finnish logistical system could easily be included in the model.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
Tämä työ on tehty osana MASTO-tutkimushanketta, jonka tarkoituksena on kehittää ohjelmistotestauksen adaptiivinen referenssimalli. Työ toteutettiin tilastollisena tutkimuksena käyttäen survey-menetelmää. Tutkimuksessa haastateltiin 31 organisaatioyksikköä eri puolelta suomea, jotka tekevät keskikriittisiä sovelluksia. Tutkimuksen hypoteeseina oli laadun riippuvuus ohjelmistokehitysmenetelmästä, asiakkaan osallistumisesta, standardin toteutumisesta, asiakassuhteesta, liiketoimintasuuntautuneisuudesta, kriittisyydestä, luottamuksesta ja testauksen tasosta. Hypoteeseista etsittiin korrelaatiota laadun kanssa tekemällä korrelaatio ja regressioanalyysi. Lisäksi tutkimuksessa kartoitettiin minkälaisia ohjelmistokehitykseen liittyviä käytäntöjä, menetelmiä ja työkaluja organisaatioyksiköissä käytettiin, ongelmia ja parannusehdotuksia liittyen ohjelmistotestaukseen, merkittävimpiä tapoja asiakkaan vaikuttamiseksi ohjelmiston laatuun sekä suurimpia hyötyjä ja haittoja ohjelmistokehityksen tai testauksen ulkoistamisessa. Tutkimuksessa havaittiin, että laatu korreloi positiivisesti ja tilastollisesti merkitsevästi testauksen tason, standardin toteutumisen, asiakasosallistumisen suunnitteluvaiheessa sekä asiakasosallistumisen ohjaukseen kanssa, luottamuksen ja yhden asiakassuhteeseen liittyvän osakysymyksen kanssa. Regressioanalyysin perusteella muodostettiin regressioyhtälö, jossa laadun todettiin positiivisesti riippuvan standardin toteutumisesta, asiakasosallistumisesta suunnitteluvaiheessa sekä luottamuksesta.
Resumo:
Eri tieteenalojen tutkijat ovat kiistelleet jo yli vuosisadan ajan ratiomuodossa olevien muuttujien käytön vaikutuksista korrelaatio- ja regressioanalyysien tuloksiin ja niiden oikeaan tulkintaan. Strategiatutkimuksen piirissä aiheeseen ei ole kuitenkaan kiinnitetty suuresti huomiota. Tämä on yllättävää, sillä ratiomuuttujat ovat hyvin yleisesti käytettyjä empiirisen strategiatutkimuksen piirissä. Tässä työssä luodaan katsaus ratiomuuttujien ympärillä käytyyn debattiin. Lisäksi selvitetään artikkelikatsauksen avulla niiden käytön yleisyyttä nykypäivän strategiatutkimuksessa. Työssä tutkitaan Monte Carlo –simulaatioiden avulla ratiomuuttujien ominaisuuksien vaikutuksia korrelaatio- ja regressioanalyysin tuloksiin erityisesti yhteisen nimittäjän tapauksissa.