25 resultados para DATA SET

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mass spectrometry (MS)-based proteomics has seen significant technical advances during the past two decades and mass spectrometry has become a central tool in many biosciences. Despite the popularity of MS-based methods, the handling of the systematic non-biological variation in the data remains a common problem. This biasing variation can result from several sources ranging from sample handling to differences caused by the instrumentation. Normalization is the procedure which aims to account for this biasing variation and make samples comparable. Many normalization methods commonly used in proteomics have been adapted from the DNA-microarray world. Studies comparing normalization methods with proteomics data sets using some variability measures exist. However, a more thorough comparison looking at the quantitative and qualitative differences of the performance of the different normalization methods and at their ability in preserving the true differential expression signal of proteins, is lacking. In this thesis, several popular and widely used normalization methods (the Linear regression normalization, Local regression normalization, Variance stabilizing normalization, Quantile-normalization, Median central tendency normalization and also variants of some of the forementioned methods), representing different strategies in normalization are being compared and evaluated with a benchmark spike-in proteomics data set. The normalization methods are evaluated in several ways. The performance of the normalization methods is evaluated qualitatively and quantitatively on a global scale and in pairwise comparisons of sample groups. In addition, it is investigated, whether performing the normalization globally on the whole data or pairwise for the comparison pairs examined, affects the performance of the normalization method in normalizing the data and preserving the true differential expression signal. In this thesis, both major and minor differences in the performance of the different normalization methods were found. Also, the way in which the normalization was performed (global normalization of the whole data or pairwise normalization of the comparison pair) affected the performance of some of the methods in pairwise comparisons. Differences among variants of the same methods were also observed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The purpose of this thesis is to study factors that explain the bilateral fiber trade flows. This is done by analyzing bilateral trade flows during 1990-2006. It will be studied also, whether there are differences between fiber types. This thesis uses a gravity model approach to study the trade flows. Gravity model is mostly used to study the aggregate data between trading countries. In this thesis the gravity model is applied to single fibers. This model is then applied to panel data set. Results from the regression show clearly that there are benefits in studying different fibers in separate. The effects differ considerably from each other. Furthermore, this thesis speaks for the existence of Linder’s effect in certain fiber types.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Potilaiden käsitys terveyteen liittyvästä elämänlaadusta lonkan tekonivelleikkauksen jälkeisenä toipumisaikana – kuuden kuukauden seurantatutkimus Tässä kaksivaiheisessa seurantatutkimuksessa tarkasteltiin potilaiden käsitystä terveyteen liittyvästä elämänlaadusta lonkan tekonivelleikkauksen jälkeisenä toipumisaikana. Tutkimuksen ensimmäisessä vaiheessa tarkoituksena oli sekä kuvailla potilaiden kokemuksia potilaana olosta, saamastaan hoidosta ja terveyspalveluorganisaatiosta että analysoida aikaisempien tutkimusten perusteella leikkauksen tuloksia potilaan kannalta. Toisessa vaiheessa tarkoituksena oli arvioida potilaiden kokemaa elämänlaatua leikkauksen jälkeen, ja sitä vaikuttivatko primaaritulokset (fyysinen toimintakyky, kipu, ahdistus) tai taloudelliset seuraukset (potilaiden itsensämaksamat kustannukset, palvelujen käyttö) terveyteen liittyvään elämänlaatuun. Tutkimuksen tavoitteena oli löytää mahdolliset kriittiset ajankohdat tai tekijät, jotka saattavat hidastaa toipumista ja siten huonontaa potilaiden elämänlaatua. Tätä tietoa voidaan käyttää hoitotyössä kun suunnitellaan sopivaa hoitoa ja tukea toipumisajalle. Tutkimuksen ensimmäisessä vaiheessa primaarileikkaukseen tulevat potilaat (n = 17) kuvailivat teemahaastatteluissa kokemuksiaan kahdesti leikkauksen jälkeen. Haastatteluaineisto analysoitiin induktiivisella sisällönanalyysilla. Lisäksi 17 tutkimusartikkelista analysoitiin deduktiivisella sisällönanalyysilla leikkauksen tuloksia potilaalle, tuloksiin vaikuttavia tekijöitä ja käytetyt tutkimusmetodit. Toisessa vaiheessa primaari- tai revisioleikkaukseen tulevat potilaat (n = 100) arvioivat leikkauksen tuloksia kuuden kuukauden ajan leikkauksen jälkeen: terveyteen liittyvää elämänlaatua, primaarituloksia ja taloudellisia seurauksia. Aineisto kerättiin erilaisilla mittareilla: Sickness Impact Profile, Finnish Version, Stait-Trait Anxiety Inventory, ja Numeric Rating Scale. Lisäksi käytettiin tätä tutkimusta varten tehtyjä kyselylomakkeita: Fyysinen toimintakyky-mittari, Palvelujen käyttö-mittari ja Kustannusmittari. Tutkimuksen toiseen vaiheen tulokset analysoitiin tilastollisilla menetelmillä. Potilaiden terveyteen liittyvä elämänlaatu parani ja kipu lievittyi leikkauksen jälkeen ja fyysinen toimintakyky lisääntyi toipumisaikana. Positiivisista muutoksista huolimatta potilaat kokivat ahdistusta samassa määrin kuin ennen leikkaustakin. Palvelujen käyttö vaihteli toipumisajan kuluessa ja potilaiden maksamissa kustannuksissa oli suuria vaihteluita. Fyysisen toimintakyvyn lisääntyminen ja kivun lieveneminen paransivat terveyteen liittyvää elämänlaatua. Sen sijaan huonompi elämänlaatu toipumisaikana oli yhteydessä suurempaan palvelujen käyttöön, kun taas kustannuksilla ei ollut yhteyttä elämänlaatuun. Potilaiden ominaispiirteet tulisi ottaa enemmän huomioon suunniteltaessa sopivaa leikkauksenjälkeistä hoitoa ja tukea. Potilaat tarvitsevat yksilöllisiä ohjeita, sillä monet taustatekijät (esim. ikä, sukupuoli, preoperatiivinen kipu, siviilisääty, ja leikkaustyyppi) vaikuttavat toipumiseen.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tämän diplomityön tavoite on luoda viitekehys kahdesta työn pääteorioista, jotka ovat: "liiketoiminnan ulkoiset menestystekijät" ja "alueiden kilpailukyky". Kummatkin teoriat sisältävät tekijöitä, joilla on vaikutusta yrityksen sijaintipaikkapäätökseen. Viitekehyksen pohjalta tarkastellaan kahta tutkimusaluetta: Landen seutua ja Kuuma-aluetta. Työn tuloksena syntyy kuva kummastakin tutkimusalueesta ja analyysi viitekehyksestä. Työn ensimmäisessä osassa käydään läpi aihealueen tutkimuksen taustaa ja mitä ongelmia tutkimuksissa on tullut esille. Senjälkeen esitellään kaikki liiketoiminnan ulkoiset menestystekijät. Alueiden kilpailukyvyn teoriaosuus täydentää viitekehyksen tekijät. Työn jälkimmäinen empiirinen osa perustuu lähdemateriaaliin, joka on kerätty haastatteluista, lehtiartikkeleista ja seminaareista koskien tutkimusalueita. Tutkimustuloksista selviää, että kummatkin tutkimusalueet ovat erilaisia ja niillä on omat avainklusterinsa ja menestyvät toimialansa. Viitekehys luotiin melko onnistuneesti. Lopulta selvisi, että se sopii hyvin aihealueen tutkimuksen laajentamiseen, mutta heikosti yksittäisenyrityksen sijaintipaikkapäätökseen.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tutkimus keskittyy kansainväliseen hajauttamiseen suomalaisen sijoittajan näkökulmasta. Tutkimuksen toinen tavoite on selvittää tehostavatko uudet kovarianssimatriisiestimaattorit minimivarianssiportfolion optimointiprosessia. Tavallisen otoskovarianssimatriisin lisäksi optimoinnissa käytetään kahta kutistusestimaattoria ja joustavaa monimuuttuja-GARCH(1,1)-mallia. Tutkimusaineisto koostuu Dow Jonesin toimialaindekseistä ja OMX-H:n portfolioindeksistä. Kansainvälinen hajautusstrategia on toteutettu käyttäen toimialalähestymistapaa ja portfoliota optimoidaan käyttäen kahtatoista komponenttia. Tutkimusaieisto kattaa vuodet 1996-2005 eli 120 kuukausittaista havaintoa. Muodostettujen portfolioiden suorituskykyä mitataan Sharpen indeksillä. Tutkimustulosten mukaan kansainvälisesti hajautettujen investointien ja kotimaisen portfolion riskikorjattujen tuottojen välillä ei ole tilastollisesti merkitsevää eroa. Myöskään uusien kovarianssimatriisiestimaattoreiden käytöstä ei synnytilastollisesti merkitsevää lisäarvoa verrattuna otoskovarianssimatrisiin perustuvaan portfolion optimointiin.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this thesis is to find out how information and communication technology affects the global consumption of printing and writing papers. Another objective is to find out, whether there are differences between paper grades in these effects. The empirical analysis is conducted by linear regression analysis using three sets of country-level panel data from 1990-2006. Data set of newsprint contains 95 countries, data set of uncoated woodfree paper 61 countries and data set of coated mechanical paper 42 countries. The material is based on paper consumption data of RISI’s Industry Statistics Database and on the information and communication technology data of GMID-database. Results indicate that number of Internet users has statistically significant negative effect on the consumption of newsprint and on the consumption of coated mechanical paper and number of mobile telephone users has positive effect on the consumptions of these papers. Results also indicate that information and communication technologies have only small effect on consumption of uncoated woodfree paper or no significant effect at all, but these results are more uncertain to some extent.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tämän diplomityön tavoitteena on selvittää, mitä alueellisia tekijöitä suomalaiset yritykset ottavat huomioon valitessaan sopivaa sijaintia suoralle investoinnille Venäjän sisällä. Muutamia yrityksen sisäisiä tekijöitä käytetään taustamuuttujina selittämään sijaintitekijöiden painotuksissa havaittavia eroja erilaisten yritysten välillä. Venäjän alueita vertaillaan lopuksi painotusten valossa. Työn ensimmäisessä osassa keskitytään suorien ulkomaisten investointien teoreettiseen taustaan. Aiempia tutkimuksia käydään läpi, jotta tekijät, joilla on havaittu olevan vaikutusta investointien sijoittumiseen maan sisällä, saadaan kartoitettua. Työn jälkimmäinen osa perustuu yrityskyselyn avulla kerättyyn empiiriseen aineistoon. Aineiston avulla selvitetään mitä tekijöitä suomalaisyritykset huomioivat sijaintipäätöstä tehdessään. Tulosten valossa on ilmeistä, että alueen markkinapotentiaali on suomalaisyrityksissä tärkein huomioitava tekijä investoinnin sijainnista päätettäessä. Myös infrastruktuuri ja kustannushyödyt vaikuttavat päätökseen. Erityyppisten yritysten painotukset ovat hyvin samanlaisia. Moskova ja Pietari vastaavat Venäjän alueista parhaiten suomalaisyritysten investoinnin sijainnille asettamia kriteerejä.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Electricity spot prices have always been a demanding data set for time series analysis, mostly because of the non-storability of electricity. This feature, making electric power unlike the other commodities, causes outstanding price spikes. Moreover, the last several years in financial world seem to show that ’spiky’ behaviour of time series is no longer an exception, but rather a regular phenomenon. The purpose of this paper is to seek patterns and relations within electricity price outliers and verify how they affect the overall statistics of the data. For the study techniques like classical Box-Jenkins approach, series DFT smoothing and GARCH models are used. The results obtained for two geographically different price series show that patterns in outliers’ occurrence are not straightforward. Additionally, there seems to be no rule that would predict the appearance of a spike from volatility, while the reverse effect is quite prominent. It is concluded that spikes cannot be predicted based only on the price series; probably some geographical and meteorological variables need to be included in modeling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Peer-to-Peer (P2P) technology has revolutionized file exchange activities besides enhancing processing power distribution. As such, this technology which is nowadays made freely available to all internet users also imposes a threat as it enables the illegal distribution of copyrighted digital work. P2P technology continuously evolves in a greater pace than copyright legislation, leading to compatibility gaps between the applicability of copyright law and the illicit file sharing and downloading. Such issues give high incentives to consumers to practise piracy using P2P systems with a low perception of risk towards prosecution, leading to substantial losses for copyright owners. This study focuses on developing insights for content owners on consumer behaviour towards piracy in Finland, where quantitative analyses are assessed using a data set based on a survey conducted by the Helsinki Institute for IT. The research approach investigates the significance of three fundamental areas in relation to evaluate consumer behaviour as: environmental-related factors, innovation-related factors and consumer-related. each of these are integrates concepts derived in previous theoretical models such as the technology acceptance model, theory of reasoned action, theory of planned behaviour, the issue-risk-judgement model and the Hunt & Vitell’s model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to its non-storability, electricity must be produced at the same time that it is consumed, as a result prices are determined on an hourly basis and thus analysis becomes more challenging. Moreover, the seasonal fluctuations in demand and supply lead to a seasonal behavior of electricity spot prices. The purpose of this thesis is to seek and remove all causal effects from electricity spot prices and remain with pure prices for modeling purposes. To achieve this we use Qlucore Omics Explorer (QOE) for the visualization and the exploration of the data set and Time Series Decomposition method to estimate and extract the deterministic components from the series. To obtain the target series we use regression based on the background variables (water reservoir and temperature). The result obtained is three price series (for Sweden, Norway and System prices) with no apparent pattern.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although social capital and health have been extensively studied during the last decade, there are still open issues in current empirical research. These concern for instance the measurement of the concept in different contexts, as well as the association between different types of social capital and different dimensions of health. The present thesis addressed these questions. The general aim was to promote the understanding of social capital and health by investigating the oldest old and the two major language groups in Finland, Swedish- and Finnish-speakers. Another aim was to contribute to the discussion on methodological issues in social capital and health research. The present thesis investigated two empirical data sets, Umeå 85+ and Health 2000. The Umeå 85+ study was a cross-sectional study of 163 individuals aged 85, 90, and 95 or older, living in the municipality of Umeå, Sweden, in the year of 2000. The Health 2000 survey was a national study of 8,028 persons aged 30 or above carried out in Finland in 2000-2001. Different indicators of structural (e.g. social contacts) and cognitive (e.g. trust) social capital, as well as health indicators were used as variables in the analyses. The Umeå 85+ data set was analyzed with factor analysis, as well as univariate and multivariate analysis of variance. The Health 2000 data was analyzed with logistic regression techniques. The results showed that the Swedish-speakers in the Finnish data set Health 2000 had consistently higher prevalence of social capital compared to the Finnish-speakers even after controlling for central sociodemographic variables. The results further showed that even if the language group differences in health were small, the Swedishspeakers experienced in general better self-reported health compared with the Finnish-speakers. Common sociodemographic variables could not explain these observed differences in health. The results imply that social capital is often, but not always, associated with health. This was clearly seen in the Umeå 85+ data set where only one health indicator (depressive symptoms) was associated with structural social capital among the oldest old. The results based on the analysis of the Health 2000 survey demonstrated that the cognitive component of social capital was associated with self-rated health and psychological health rather than with participation in social activities and social contacts. In addition, social capital statistically reduced the health advantage especially for Swedish-speaking men, indicating that high prevalence of social capital may promote health. Finally, the present thesis also discussed the issue of methodological challenges faced with when analyzing social capital and health. It was suggested that certain components of social capital such as bonding and bridging social capital may be more relevant than structural and cognitive components when investigating social capital among the two language groups in Finland. The results concerning the oldest old indicated that the structural aspects of social capital probably reflect current living conditions, whereas cognitive social capital reflects attitudes and traits often acquired decades earlier. This is interpreted as an indication of the fact that structural and cognitive social capital are closely related yet empirically two distinctive concepts. Taken together, some components of social capital may be more relevant to study than others depending on which population group and age group is under study. The results also implied that the choice of cut-off point of dichotomization of selfrated health has an impact on the estimated effects of the explanatory variables. When the whole age interval, 35-64 years, was analyzed with logistic regression techniques the choice of cut-off point did not matter for the estimated effects of marital status and educational level. The results changed, however, when the age interval was divided into three shorter intervals. If self-rated health is explored using wide age intervals that do not account for age-dependent covariates there is a risk of drawing misleading conclusions. In conclusion, the results presented in the thesis suggest that the uneven distribution of social capital observed between the two language groups in Finland are of importance when trying to further understand health inequalities that exist between Swedish- and Finnish-speakers in Finland. Although social capital seemed to be relevant to the understanding of health among the oldest old, the meaning of social capital is probably different compared to a less vulnerable age group. This should be noticed in future empirical research. In the present thesis, it was shown that the relationship between social capital and health is complex and multidimensional. Different aspects of social capital seem to be important for different aspects of health. This reduces the possibility to generalize the results and to recommend general policy implementations in this area. An increased methodological awareness regarding social capital as well as health are called for in order to further understand the cfomplex association between them. However, based on the present data and findings social capital is associated with health. To understand individual health one must also consider social aspects of the individuals’ environment such as social capital.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task