906 resultados para Exploratory statistical data analysis
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.
Resumo:
A new analytical method was developed to non-destructively determine pH and degree of polymerisation (DP) of cellulose in fibres in 19th 20th century painting canvases, and to identify the fibre type: cotton, linen, hemp, ramie or jute. The method is based on NIR spectroscopy and multivariate data analysis, while for calibration and validation a reference collection of 199 historical canvas samples was used. The reference collection was analysed destructively using microscopy and chemical analytical methods. Partial least squares regression was used to build quantitative methods to determine pH and DP, and linear discriminant analysis was used to determine the fibre type. To interpret the obtained chemical information, an expert assessment panel developed a categorisation system to discriminate between canvases that may not be fit to withstand excessive mechanical stress, e.g. transportation. The limiting DP for this category was found to be 600. With the new method and categorisation system, canvases of 12 Dalí paintings from the Fundació Gala-Salvador Dalí (Figueres, Spain) were non-destructively analysed for pH, DP and fibre type, and their fitness determined, which informs conservation recommendations. The study demonstrates that collection-wide canvas condition surveys can be performed efficiently and non-destructively, which could significantly improve collection management.
Resumo:
Introduction. This study presents the results of the implementation process of portfolio in the course of four consecutive years. The plan includes three phases (initiation, development and consolidation). The sample is 480 students studying the first year of nursing at the University of Girona. The objective is to evaluate the effectiveness of the instrument and achieve its construction in a self-regulated process. Subjects and methods. The proposed methodology is based on the sequential triangulation between methods. The study of the same empirical unit it’s used two investigation strategies, quantitative and qualitative. Study 1: quantitative, descriptive, longitudinal and prospective. The statistical analysis of paired data for continuous variables that follow a normaldistribution is made with t Student-Fisher test. The correlation between two numerical variables is used the Pearson correlation index. Study 2: qualitative, uses the discussion groups and topics. For textual data analysis is used Atlas.ti. Results. The final score for students who prepare the portfolio is higher (7.78) than the score who do not prepare (7) (p ≤ 0.001). A significant correlation exists between the portfolio score and final score (p ≤ 0.001). The trend study showsa greater sensitivity of the instrument assessment. Conclusion. The final design of the portfolio is characterized by mixed, flexible and encourages the student reflection and empowers the reflection on the continuum of learning
Resumo:
The objective of this work was to develop a free access exploratory data analysis software application for academic use that is easy to install and can be handled without user-level programming due to extensive use of chemometrics and its association with applications that require purchased licenses or routines. The developed software, called Chemostat, employs Hierarchical Cluster Analysis (HCA), Principal Component Analysis (PCA), intervals Principal Component Analysis (iPCA), as well as correction methods, data transformation and outlier detection. The data can be imported from the clipboard, text files, ASCII or FT-IR Perkin-Elmer “.sp” files. It generates a variety of charts and tables that allow the analysis of results that can be exported in several formats. The main features of the software were tested using midinfrared and near-infrared spectra in vegetable oils and digital images obtained from different types of commercial diesel. In order to validate the software results, the same sets of data were analyzed using Matlab© and the results in both applications matched in various combinations. In addition to the desktop version, the reuse of algorithms allowed an online version to be provided that offers a unique experience on the web. Both applications are available in English.
Resumo:
The condensation rate has to be high in the safety pressure suppression pool systems of Boiling Water Reactors (BWR) in order to fulfill their safety function. The phenomena due to such a high direct contact condensation (DCC) rate turn out to be very challenging to be analysed either with experiments or numerical simulations. In this thesis, the suppression pool experiments carried out in the POOLEX facility of Lappeenranta University of Technology were simulated. Two different condensation modes were modelled by using the 2-phase CFD codes NEPTUNE CFD and TransAT. The DCC models applied were the typical ones to be used for separated flows in channels, and their applicability to the rapidly condensing flow in the condensation pool context had not been tested earlier. A low Reynolds number case was the first to be simulated. The POOLEX experiment STB-31 was operated near the conditions between the ’quasi-steady oscillatory interface condensation’ mode and the ’condensation within the blowdown pipe’ mode. The condensation models of Lakehal et al. and Coste & Lavi´eville predicted the condensation rate quite accurately, while the other tested ones overestimated it. It was possible to get the direct phase change solution to settle near to the measured values, but a very high resolution of calculation grid was needed. Secondly, a high Reynolds number case corresponding to the ’chugging’ mode was simulated. The POOLEX experiment STB-28 was chosen, because various standard and highspeed video samples of bubbles were recorded during it. In order to extract numerical information from the video material, a pattern recognition procedure was programmed. The bubble size distributions and the frequencies of chugging were calculated with this procedure. With the statistical data of the bubble sizes and temporal data of the bubble/jet appearance, it was possible to compare the condensation rates between the experiment and the CFD simulations. In the chugging simulations, a spherically curvilinear calculation grid at the blowdown pipe exit improved the convergence and decreased the required cell count. The compressible flow solver with complete steam-tables was beneficial for the numerical success of the simulations. The Hughes-Duffey model and, to some extent, the Coste & Lavi´eville model produced realistic chugging behavior. The initial level of the steam/water interface was an important factor to determine the initiation of the chugging. If the interface was initialized with a water level high enough inside the blowdown pipe, the vigorous penetration of a water plug into the pool created a turbulent wake which invoked the chugging that was self-sustaining. A 3D simulation with a suitable DCC model produced qualitatively very realistic shapes of the chugging bubbles and jets. The comparative FFT analysis of the bubble size data and the pool bottom pressure data gave useful information to distinguish the eigenmodes of chugging, bubbling, and pool structure oscillations.
Resumo:
Markku Laitinen's keynote presentation in the QQML conference in Limerick, Ireland the 23rd of April, 2012.
Resumo:
A strategy process was completed in the ESF project “Promotion of Work-related Immigration”, which was implemented at Centre for Economic Development, Transport and the Environment for North Ostrobothnia, and an immigration strategy was drawn up for Northern Ostrobothnia on the basis of the process. Information was collected about the situation in Northern Ostrobothnia from the point of view of immigration and the future availability of labour. The intention was to use the information as background material for the strategy. Employers’ need for support in recruiting foreign labour was investigated with a broad inquiry, to which 1000 respondents replied. The strategy process was carried out together with an outside consultant (Net Effect Oy) by arranging three workshops and a seminar where the workshop results were summarised. A large number of companies, authorities, municipalities, associations, project actors and immigrants engaged in immigration issues participated in the workshops. The draft strategy is based on their experiences about immigration and on statistical data, background inquiries and surveys. To ensure the accuracy of the draft strategy, comments were requested from several parties and received from 64 organisations. The core of the immigration strategy consists of an initial analysis, values, a vision and priorities. The strategy is composed of three priorities. The key aim of the priority Internationalisation and Supporting Diversity is to support diversity in schools, workplaces and people’s everyday lives e.g. through attitude development and by promoting internationalisation in companies and education institutions. The aim of the priority Supporting Entrepreneurship and Recruiting Foreign Labour is to promote entrepreneurship among immigrants and the recruitment of foreign labour and to develop the forecasting of educational needs. The priority Developing Integration Services, Regional Cooperation and Networks, in turn, seeks to develop the service structure and policies of immigrant integration and to increase cooperation and exchange of information between regional actors engaged in integration issues. The aim is to use the strategy as a guideline document for immigration issues in Northern Ostrobothnia. The strategy is used to coordinate the existing organisations and operations dealing with immigration issues. In addition, it contains a future-oriented focus and underlines the management of new immigration projects and operations. The main party responsible for the implementation of the strategy is the Immigration Committee. In addition, responsible parties have been assigned to each measure. The implementation of the immigration strategy will be monitored annually on the basis of indicators.
Resumo:
The aim was to evaluate for 75 days the impact on production of the remaining burden of ivermectin (IVM)-resistant parasites in naturally infected feedlot calves. The herds came from tick-infested areas of cattle breeding where the systematic use of IVM to control tick increases the gastrointestinal parasites resistant to this drug. This investigation was carried out in two commercial feedlots in Buenos Aires province. In feedlot A, two groups of 35 animal each received IVM 1% and the other received ricobendazole (RBZ) 10% respectively. The same was done in feedlot B. On day 0, two groups of 35 animals were made in feedlots A and B. Fecal samples were taken on days 0, 22, 54 and 75 pos-treatment (PT), and body weight was registered, from each animal. Fecal samples were processed for individual count of eggs per gram (EPG) and pooled fecal culture was carried out for identification of the parasite genus in each sampling. Fecal egg count reduction test (FECR) was calculated on day 22 PT. The study design used was a totally randomized block, with commercial feedlot and sex as block variables. For data analysis, a mixed model of the SAS statistical program was used. The FECR average on day 22 was 28.4% in the IVM group, and 94,2 % in the RBZ group . From this date on, significant differences in EPG were kept until day 54. EPG counts were only equal near the end of the trial, on day 75 (p=0.16). In both commercial feedlots, especially in the IVM group, Cooperia spp. was the most prevalent parasite in the fecal cultures. Significant differences in weight (P<0.01) on post-treatment day 75 was found between the average weight in the RBZ and the IVM group (246 vs. 238 kg respectively), what means a difference of 8.3% in gains. The importance for production in the antiparasite failure treatment in commercial feedlots was demonstrated, and the need of pos-treatment controls to evaluate the efficacy of the antiparasitic administered is emphasized.
Resumo:
The agouti is one of the most intensely hunted species throughout the Amazon and the semiarid regions of north-eastern Brazil. Considering the current tendency of wild animal management in captivity, the objective of this study was to determine heart reference values for agouti raised in captivity, based on electrocardiographic assessments (ECG). Adult agouti were selected without clinical signs of heart disease (n=30). The animals were restrained physically and then the ECG was performed. Standardized measurements were taken to establish the statistical analysis of the data. Analysis of the QRS complex showed values compatible with previous reports in peer animals and the limited data available for other wild and exotic species, except for the T wave that showed similar amplitude to the R wave in all the animals studied. The data obtained provided the first reference values for ECG tracings in agouti, contributing to a better understanding of heart electrophysiology in identifying myocardial pathology in these animals.
Resumo:
Min avhandling är en diakronisk och kontrastiv undersökning av texttyper. Forskningsmaterialet består av kontaktannonser i tidningarna Süddeutsche Zeitung och Helsingin Sanomat under tiden 1900 – 1999. Materialet består av 652 tyska och 538 finska annonser. De undersökta annonserna har publicerats i maj och har samlats från ovannämnda tidningar vart tionde år. Materialet har analyserats med ett statistiskt SPSS-program. I avhandlingen analyseras utvecklingen av ovannämnda texttyp under hundra år i två olika kulturer, den tyska och den finska. Syftet med avhandlingen är att med hjälp av detta material finna språkliga och kulturella likheter och skillnader i kontaktannonser. Utgångspunkten är att språkliga uttryck avspeglar sin tids samhälleliga värderingar, vilka således också påverkar sökandet efter en livskamrat. Analysresultaten granskas sålunda i ett större samhälleligt sammanhang under olika decennier. Annonstexterna undersöks dock inte utgående från enskilda samhälleliga skeenden. Avhandlingen analyserar 13 olika informationsenheter i kontaktannonserna, huruvida dessa enheter förekommer under hela den aktuella perioden och om samma informationsenheter förekommer i annonser i de båda kulturerna. Avhandlingen är sålunda intra- och interlingual samt interkulturell. Genom denna metod får man fram de kännetecken som är betecknande för denna texttyp under en viss tid i de bägge kulturerna. Avhandlingen är indelad i tre delar. Den första delen ger bakgrundsinformation om äktenskapets och familjebegreppets historia samt om uppkomsten av den tyska och finska pressen. Den andra teoretiska delen behandlar text- och texttyplingvistik samt nuvarande forskning inom dessa områden. Den tredje och mest omfattade delen består av en kvalitativ och kvantitativ analys, som omfattar 11 olika forskningsdelar. Undersökningen visar att man i texttypen kontaktannonser kan upptäcka skillnader t ex redan däri att en tysk annons skiljer sig från en finsk vad längd och informationsmängd beträffar. En finsk annons förlitar sig i sin språkliga knapphet på att läsaren förstår kontexten i texttypen. Av avhandlingen framgår också att vid analys av texttyper bör deras historiska och kulturella kontext beaktas, eftersom analysen påvisar att texttyperna är historie- och kulturbundna.
Resumo:
Statistical analyses of measurements that can be described by statistical models are of essence in astronomy and in scientific inquiry in general. The sensitivity of such analyses, modelling approaches, and the consequent predictions, is sometimes highly dependent on the exact techniques applied, and improvements therein can result in significantly better understanding of the observed system of interest. Particularly, optimising the sensitivity of statistical techniques in detecting the faint signatures of low-mass planets orbiting the nearby stars is, together with improvements in instrumentation, essential in estimating the properties of the population of such planets, and in the race to detect Earth-analogs, i.e. planets that could support liquid water and, perhaps, life on their surfaces. We review the developments in Bayesian statistical techniques applicable to detections planets orbiting nearby stars and astronomical data analysis problems in general. We also discuss these techniques and demonstrate their usefulness by using various examples and detailed descriptions of the respective mathematics involved. We demonstrate the practical aspects of Bayesian statistical techniques by describing several algorithms and numerical techniques, as well as theoretical constructions, in the estimation of model parameters and in hypothesis testing. We also apply these algorithms to Doppler measurements of nearby stars to show how they can be used in practice to obtain as much information from the noisy data as possible. Bayesian statistical techniques are powerful tools in analysing and interpreting noisy data and should be preferred in practice whenever computational limitations are not too restrictive.
Resumo:
Hoitotyön koulutukseen pyritään valitsemaan alalle soveltuvia, motivoituneita sekä teoreettisissa ja kliinisissä opinnoissa menestyviä opiskelijoita. Tämän seurantatutkimuksen tarkoituksena oli vertailla soveltuvuuskokeella ja kirjallisella kokeella valittujen hoitotyön opiskelijoiden osaamista ja opiskelumotivaatiota. Tutkimuksen tavoitteena oli tehdä tutkimustulosten perusteella hoitotyön koulutuksen opiskelijavalintoihin liittyviä kehittämisehdotuksia. Tutkimuksen kohderyhmänä olivat yhteen ammattikorkeakouluun syksyn 2002 ja syksyn 2004 välisenä aikana hoitotyön koulutukseen kahdella eri valintakoemenetelmällä valitut hoitotyön opiskelijat (N=626) (sairaanhoitotyö, terveydenhoitotyö, kätilötyö). Opiskelijaryhmistä muodostettiin kaksi kohorttia valintakoemenetelmän perusteella: soveltuvuuskoe (VAL1, N=368) ja kirjallinen koe (VAL2, N=258). Seurantatutkimuksen aineisto kerättiin opiskelijoiden opintorekisteristä sekä kahdella strukturoidulla mittarilla, joilla kartoitettiin hoitotyön opiskelijoiden itsearvioitua hoitotyön osaamista (OSAA-mittari) ja opiskelumotivaatiota (MOTI-mittari). Seurantatutkimuksen aineistonkeruu ajoittui opiskelijoiden kolmannelle lukukaudella (1. mittaus, 2004‒2006, VAL1 n=234, VAL2 n=126) ja valmistumisvaiheeseen (2. mittaus, 2006‒2009, VAL1 n=149, VAL2 n=108). Ensimmäisen mittauksen vastausprosentti oli 75,0 % ja toisen mittauksen 92,4 %. Aineistojen analysoinnissa käytettiin pitkittäistutkimukseen soveltuvia monimuuttujamenetelmiä. Kahdella valintakoemenetelmällä valikoitui pienistä eroista huolimatta osaamiseltaan ja opiskelumotivaatioltaan hyvin samanlaisia opiskelijoita. Soveltuvuuskokeella valitut opiskelijat kokivat ryhmän kannustavuuden vahvemmaksi valmistumisvaiheessa kuin kirjallisella kokeella valitut. Kirjallisella kokeella valittujen opiskelijoiden kolmannen lukukauden arvosanoihin perustuva osaaminen oli parempaa kuin soveltuvuuskokeella valittujen opiskelijoiden. Suuntautumisvaihtoehto, hoitoalan työkokemus, peruskoulutus ja hakusija olivat merkittävimmin yhteydessä opiskelijoiden osaamiseen ja opiskelumotivaatioon. Valintakoemenetelmä selitti eniten opiskelijoiden osaamisessa ja opiskelumotivaatiossa ilmenneitä eroja, joskin selitysosuudet jäivät alhaisiksi. Kehittämisehdotukset kohdistuvat valintakoemenetelmien kehittämiseen ja säännölliseen arviointiin sekä alalle motivoituneisuuden määrittelyyn ja mittaamisen kehittämiseen. Jatkotutkimusaiheina ehdotetaan eri valintakoemenetelmien testaamista ja tutkimuksessa käytettyjen mittareiden edelleen kehittämistä.
Resumo:
Workshop at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.
Resumo:
Results of subgroup analysis (SA) reported in randomized clinical trials (RCT) cannot be adequately interpreted without information about the methods used in the study design and the data analysis. Our aim was to show how often inaccurate or incomplete reports occur. First, we selected eight methodological aspects of SA on the basis of their importance to a reader in determining the confidence that should be placed in the author's conclusions regarding such analysis. Then, we reviewed the current practice of reporting these methodological aspects of SA in clinical trials in four leading journals, i.e., the New England Journal of Medicine, the Journal of the American Medical Association, the Lancet, and the American Journal of Public Health. Eight consecutive reports from each journal published after July 1, 1998 were included. Of the 32 trials surveyed, 17 (53%) had at least one SA. Overall, the proportion of RCT reporting a particular methodological aspect ranged from 23 to 94%. Information on whether the SA preceded/followed the analysis was reported in only 7 (41%) of the studies. Of the total possible number of items to be reported, NEJM, JAMA, Lancet and AJPH clearly mentioned 59, 67, 58 and 72%, respectively. We conclude that current reporting of SA in RCT is incomplete and inaccurate. The results of such SA may have harmful effects on treatment recommendations if accepted without judicious scrutiny. We recommend that editors improve the reporting of SA in RCT by giving authors a list of the important items to be reported.