25 resultados para Statistical Tolerance Analysis
em Helda - Digital Repository of University of Helsinki
Resumo:
In this study we explore the concurrent, combined use of three research methods, statistical corpus analysis and two psycholinguistic experiments (a forced-choice and an acceptability rating task), using verbal synonymy in Finnish as a case in point. In addition to supporting conclusions from earlier studies concerning the relationships between corpus-based and ex- perimental data (e. g., Featherston 2005), we show that each method adds to our understanding of the studied phenomenon, in a way which could not be achieved through any single method by itself. Most importantly, whereas relative rareness in a corpus is associated with dispreference in selection, such infrequency does not categorically always entail substantially lower acceptability. Furthermore, we show that forced-choice and acceptability rating tasks pertain to distinct linguistic processes, with category-wise in- commensurable scales of measurement, and should therefore be merged with caution, if at all.
Resumo:
FTIR-spektroskopia (Fourier-muunnosinfrapunaspektroskopia) on nopea analyysimenetelmä. Fourier-laitteissa interferometrin käyttäminen mahdollistaa koko infrapunataajuusalueen mittaamisen muutamassa sekunnissa. ATR-liitännäisellä varustetun FTIR-spektrometrin käyttö ei edellytä juuri näytteen valmistusta ja siksi menetelmä on käytössä myös helppo. ATR-liitännäinen mahdollistaa myös monien erilaisten näytteiden analysoinnin. Infrapunaspektrin mittaaminen onnistuu myös sellaisista näytteistä, joille perinteisiä näytteenvalmistusmenetelmiä ei voida käyttää. FTIR-spektroskopian avulla saatu tieto yhdistetään usein tilastollisiin monimuuttuja-analyyseihin. Klusterianalyysin avulla voidaan spektreistä saatu tieto ryhmitellä samanlaisuuteen perustuen. Hierarkkisessa klusterianalyysissa objektien välinen samanlaisuus määritetään laskemalla niiden välinen etäisyys. Pääkomponenttianalyysin avulla vähennetään datan ulotteisuutta ja luodaan uusia korreloimattomia pääkomponentteja. Pääkomponenttien tulee säilyttää mahdollisimman suuri määrä alkuperäisen datan variaatiosta. FTIR-spektroskopian ja monimuuttujamenetelmien sovellusmahdollisuuksia on tutkittu paljon. Elintarviketeollisuudessa sen soveltuvuutta esimerkiksi laadun valvontaan on tutkittu. Menetelmää on käytetty myös haihtuvien öljyjen kemiallisten koostumusten tunnistukseen sekä öljykasvien kemotyyppien havaitsemiseen. Tässä tutkimuksessa arvioitiin menetelmän käyttöä suoputken uutenäytteiden luokittelussa. Tutkimuksessa suoputken eri kasvinosien uutenäytteiden FTIR-spektrejä vertailtiin valikoiduista puhdasaineista mitattuihin FTIR-spektreihin. Puhdasaineiden FTIR-spektreistä tunnistettiin niiden tyypilliset absorptiovyöhykkeet. Furanokumariinien spektrien intensiivisten vyöhykkeiden aaltolukualueet valittiin monimuuttuja-analyyseihin. Monimuuttuja-analyysit tehtiin myös IR-spektrin sormenjälkialueelta aaltolukualueelta 1785-725 cm-1. Uutenäytteitä pyrittiin luokittelemaan niiden keräyspaikan ja kumariinipitoisuuden mukaan. Keräyspaikan mukaan ryhmittymistä oli havaittavissa, mikä selittyi vyöhykkeiden aaltolukualueiden mukaan tehdyissä analyyseissa pääosin kumariinipitoisuuksilla. Näissä analyyseissa uutenäytteet pääosin ryhmittyivät ja erottuivat kokonaiskumariinipitoisuuksien mukaan. Myös aaltolukualueen 1785-725 cm-1 analyyseissa havaittiin keräyspaikan mukaan ryhmittymistä, mitä kumariinipitoisuudet eivät kuitenkaan selittäneet. Näihin ryhmittymisiin vaikuttivat mahdollisesti muiden yhdisteiden samanlaiset pitoisuudet näytteissä. Analyyseissa käytettiin myös muita aaltolukualueita, mutta tulokset eivät juuri poikenneet aiemmista. 2. kertaluvun derivaattaspektrien monimuuttuja-analyysit sormenjälkialueelta eivät myöskään muuttaneet tuloksia havaittavasti. Jatkotutkimuksissa nyt käytettyä menetelmää on mahdollista edelleen kehittää esimerkiksi tutkimalla monimuuttuja-analyyseissa 2. kertaluvun derivaattaspektreistä suppeampia, tarkkaan valittuja aaltolukualueita.
Resumo:
Tiivistelmä ReferatAbstract Metabolomics is a rapidly growing research field that studies the response of biological systems to environmental factors, disease states and genetic modifications. It aims at measuring the complete set of endogenous metabolites, i.e. the metabolome, in a biological sample such as plasma or cells. Because metabolites are the intermediates and end products of biochemical reactions, metabolite compositions and metabolite levels in biological samples can provide a wealth of information on on-going processes in a living system. Due to the complexity of the metabolome, metabolomic analysis poses a challenge to analytical chemistry. Adequate sample preparation is critical to accurate and reproducible analysis, and the analytical techniques must have high resolution and sensitivity to allow detection of as many metabolites as possible. Furthermore, as the information contained in the metabolome is immense, the data set collected from metabolomic studies is very large. In order to extract the relevant information from such large data sets, efficient data processing and multivariate data analysis methods are needed. In the research presented in this thesis, metabolomics was used to study mechanisms of polymeric gene delivery to retinal pigment epithelial (RPE) cells. The aim of the study was to detect differences in metabolomic fingerprints between transfected cells and non-transfected controls, and thereafter to identify metabolites responsible for the discrimination. The plasmid pCMV-β was introduced into RPE cells using the vector polyethyleneimine (PEI). The samples were analyzed using high performance liquid chromatography (HPLC) and ultra performance liquid chromatography (UPLC) coupled to a triple quadrupole (QqQ) mass spectrometer (MS). The software MZmine was used for raw data processing and principal component analysis (PCA) was used in statistical data analysis. The results revealed differences in metabolomic fingerprints between transfected cells and non-transfected controls. However, reliable fingerprinting data could not be obtained because of low analysis repeatability. Therefore, no attempts were made to identify metabolites responsible for discrimination between sample groups. Repeatability and accuracy of analyses can be influenced by protocol optimization. However, in this study, optimization of analytical methods was hindered by the very small number of samples available for analysis. In conclusion, this study demonstrates that obtaining reliable fingerprinting data is technically demanding, and the protocols need to be thoroughly optimized in order to approach the goals of gaining information on mechanisms of gene delivery.
Resumo:
The study describes and analyzes Finland Swedes attitudes to modern-day linguistic influence, the relationship between informants explicitly reported views and the implicit attitudes they express towards language influence. The methods are primarily sociolinguistic. For the analysis of opinions and attitudes I have further developed and tested a new tool in attitude research. With statistical correlation analysis of data collected through a quantitative survey I describe the views that Swedish-language Finns (N=500) report on the influence of English, on imports, and on domain loss. With experimental matchedguise techniques, I study Finland-Swedes (N=600) subconscious reactions to English imports in spoken text. My results show that the subconscious reactions in some respects differ markedly from the views informants explicitly report that they have: informants respond that they would like English words that come into Swedish to be replaced by Swedish replacement words, but in a matched-guise test on their subconscious attitudes, the informants consider English words in a Swedish context to have a positive effect. The topic is further dealt with in interviews where I examine 36 informants implicit attitudes through interactional sociolinguistic analyses. This study comes close to pragmatic discourse analysis in its focus on pragmatic particles and modality. The study makes a rather strict distinction between explicitly expressed opinions and implicit, subconscious attitudes. The quantitative analyses suggest that the opinions we express can be tied to the explicit in language. The outcome of the matched-guise test shows that it is furthermore possible to find subconscious, implicit attitudes that people in actual situations rely on when they make decisions. The discourse analysis finds many subconscious signals, but it also shows that the signals arise in interaction with one s interlocutor, the situation, and the norms in the society. To account for this I have introduced the concept of socioconscious attitude. Socioconscious attitudes reflect not only the traditions and values the utterer grew up with, but also the speaker s relation to the social situation (s)he takes part in.
Resumo:
The aim of this study is to find out how urban segregation is connected to the differentiation in educational outcomes in public schools. The connection between urban structure and educational outcomes is studied on both the primary and secondary school level. The secondary purpose of this study is to find out whether the free school choice policy introduced in the mid-1990´s has an effect on the educational outcomes in secondary schools or on the observed relationship between the urban structure and educational outcomes. The study is quantitative in nature, and the most important method used is statistical regression analysis. The educational outcome data ranging the years from 1999 to 2002 has been provided by the Finnish National Board of Education, and the data containing variables describing the social and physical structure of Helsinki has been provided by Statistics Finland and City of Helsinki Urban Facts. The central observation is that there is a clear connection between urban segregation and differences in educational outcomes in public schools. With variables describing urban structure, it is possible to statistically explain up to 70 % of the variation in educational outcomes in the primary schools and 60 % of the variation in educational oucomes in the secondary schools. The most significant variables in relation to low educational outcomes in Helsinki are abundance of public housing, low educational status of the adult population and high numbers of immigrants in the school's catchment area. The regression model has been constructed using these variables. The lower coefficient of determination in the educational outcomes of secondary schools is mostly due to the effects of secondary school choice. Studying the public school market revealed that students selecting a secondary school outside their local catchment area cause an increase in the variation of the educational outcomes between secondary schools. When the number of students selecting a school outside their local catchment area is taken into account in the regressional model, it is possible to explain up to 80 % of the variation in educational outcomes in the secondary schools in Helsinki.
Resumo:
In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Drug Analysis without Primary Reference Standards: Application of LC-TOFMS and LC-CLND to Biofluids and Seized Material Primary reference standards for new drugs, metabolites, designer drugs or rare substances may not be obtainable within a reasonable period of time or their availability may also be hindered by extensive administrative requirements. Standards are usually costly and may have a limited shelf life. Finally, many compounds are not available commercially and sometimes not at all. A new approach within forensic and clinical drug analysis involves substance identification based on accurate mass measurement by liquid chromatography coupled with time-of-flight mass spectrometry (LC-TOFMS) and quantification by LC coupled with chemiluminescence nitrogen detection (LC-CLND) possessing equimolar response to nitrogen. Formula-based identification relies on the fact that the accurate mass of an ion from a chemical compound corresponds to the elemental composition of that compound. Single-calibrant nitrogen based quantification is feasible with a nitrogen-specific detector since approximately 90% of drugs contain nitrogen. A method was developed for toxicological drug screening in 1 ml urine samples by LC-TOFMS. A large target database of exact monoisotopic masses was constructed, representing the elemental formulae of reference drugs and their metabolites. Identification was based on matching the sample component s measured parameters with those in the database, including accurate mass and retention time, if available. In addition, an algorithm for isotopic pattern match (SigmaFit) was applied. Differences in ion abundance in urine extracts did not affect the mass accuracy or the SigmaFit values. For routine screening practice, a mass tolerance of 10 ppm and a SigmaFit tolerance of 0.03 were established. Seized street drug samples were analysed instantly by LC-TOFMS and LC-CLND, using a dilute and shoot approach. In the quantitative analysis of amphetamine, heroin and cocaine findings, the mean relative difference between the results of LC-CLND and the reference methods was only 11%. In blood specimens, liquid-liquid extraction recoveries for basic lipophilic drugs were first established and the validity of the generic extraction recovery-corrected single-calibrant LC-CLND was then verified with proficiency test samples. The mean accuracy was 24% and 17% for plasma and whole blood samples, respectively, all results falling within the confidence range of the reference concentrations. Further, metabolic ratios for the opioid drug tramadol were determined in a pharmacogenetic study setting. Extraction recovery estimation, based on model compounds with similar physicochemical characteristics, produced clinically feasible results without reference standards.
Resumo:
Acacia senegal, the gum arabic producing tree, is the most important component in traditional dryland agroforestry systems in the Blue Nile region, Sudan. The aim of the present study was to provide new knowledge on the potential use of A. senegal in dryland agroforestry systems on clay soils, as well as information on tree/crop interaction, and on silvicultural and management tools, with consideration on system productivity, nutrient cycling and sustainability. Moreover, the aim was also to clarify the intra-specific variation in the performance of A. senegal and, specifically, the adaptation of trees of different origin to the clay soils of the Blue Nile region. In agroforestry systems established at the beginning of the study, tree and crop growth, water use, gum and crop yields, nutrient cycling and system performance were investigated for a period of four years (1999 to 2002). Trees were grown at 5 x 5 m and 10 x 10 m spacing alone or in mixture with sorghum or sesame; crops were also grown in sole culture. The symbiotic biological N2 fixation by A. senegal was estimated using the 15N natural abundance (δ15N) procedure in eight provenances collected from different environments and soil types of the gum arabic belt and grown in clay soil in the Blue Nile region. Balanites aegyptiaca (a non-legume) was used as a non-N-fixing reference tree species, so as to allow 15N-based estimates of the proportion of the nitrogen in trees derived from the atmosphere. In the planted acacia trees, measurements were made on shoot growth, water-use efficiency (as assessed by the δ13C method) and (starting from the third year) gum production. Carbon isotope ratios were obtained from the leaves and branch wood samples. The agroforestry system design caused no statistically significant variation in water use, but the variation was highly significant between years, and the highest water use occurred in the years with high rainfall. No statistically significant differences were found in sorghum or sesame yields when intercropping and sole crop systems were compared (yield averages were 1.54 and 1.54 ha-1 for sorghum and 0.36 and 0.42 t ha-1 for sesame in the intercropped and mono-crop plots, respectively). Thus, at an early stage of agroforestry system management, A. senegal had no detrimental effect on crop yield, but the pattern of resource capture by trees and crops may change as the system matures. Intercropping resulted in taller trees and larger basal and crown diameters as compared to the development of sole trees. It also resulted in a higher land equivalent ratio. When gum yields were analysed it was found that a significant positive relationship existed between the second gum picking and the total gum yield. The second gum picking seems to be a decisive factor in gum production and could be used as an indicator for the total gum yield in a particular year. In trees, the concentrations of N and P were higher in leaves and roots, whereas the levels of K were higher in stems, branches and roots. Soil organic matter, N, P and K contents were highest in the upper soil stratum. There was some indication that the P content slightly increased in the topsoil as the agroforestry plantations aged. At a stocking of 400 trees ha-1 (5 x 5 m spacing), A. senegal accumulated in the biomass a total of 18, 1.21, 7.8 and 972 kg ha-1of N, P, K and OC, respectively. Trees contributed ca. 217 and 1500 kg ha-1 of K and OC, respectively, to the top 25-cm of soil over the first four years of intercropping. Acacia provenances of clay plain origin showed considerable variation in seed weight. They also had the lowest average seed weight as compared to the sandy soil (western) provenances. At the experimental site in the clay soil region, the clay provenances were distinctly superior to the sand provenances in all traits studied but especially in basal diameter and crown width, thus reflecting their adaptation to the environment. Values of δ13C, indicating water use efficiency, were higher in the sand soil group as compared to the clay one, both in leaves and in branch wood. This suggests that the sand provenances (with an average value of -28.07 ) displayed conservative water use and high drought tolerance. Of the clay provenances, the local one (Bout) displayed a highly negative (-29.31 ) value, which indicates less conservative water use that resulted in high productivity at this particular clay-soil site. Water use thus appeared to correspond to the environmental conditions prevailing at the original locations for these provenances. Results suggest that A. senegal provenances from the clay part of the gum belt are adapted for a faster growth rate and higher biomass and gum productivity as compared to provenances from sand regions. A strong negative relationship was found between the per-tree gum yield and water use efficiency, as indicated by δ13C. The differences in water use and gum production were greater among provenance groups than within them, suggesting that selection among rather than within provenances would result in distinct genetic gain in gum yield. The relative δ15N values ( ) were higher in B. aegyptiaca than in the N2-fixing acacia provenances. The amount of Ndfa increased significantly with age in all provenances, indicating that A. senegal is a potentially efficient nitrogen fixer and has an important role in t agroforestry development. The total above-ground contribution of fixed N to foliage growth in 4-year-old A. senegal trees was highest in the Rahad sand-soil provenance (46.7 kg N ha-1) and lowest in the Mazmoom clay-soil provenance (28.7 kg N ha-1). This study represents the first use of the δ15N method for estimating the N input by A. senegal in the gum belt of Sudan. Key words: Acacia senegal, agroforestry, clay plain, δ13C, δ15N, gum arabic, nutrient cycling, Ndfa, Sorghum bicolor, Sesamum indicum
Resumo:
There exists various suggestions for building a functional and a fault-tolerant large-scale quantum computer. Topological quantum computation is a more exotic suggestion, which makes use of the properties of quasiparticles manifest only in certain two-dimensional systems. These so called anyons exhibit topological degrees of freedom, which, in principle, can be used to execute quantum computation with intrinsic fault-tolerance. This feature is the main incentive to study topological quantum computation. The objective of this thesis is to provide an accessible introduction to the theory. In this thesis one has considered the theory of anyons arising in two-dimensional quantum mechanical systems, which are described by gauge theories based on so called quantum double symmetries. The quasiparticles are shown to exhibit interactions and carry quantum numbers, which are both of topological nature. Particularly, it is found that the addition of the quantum numbers is not unique, but that the fusion of the quasiparticles is described by a non-trivial fusion algebra. It is discussed how this property can be used to encode quantum information in a manner which is intrinsically protected from decoherence and how one could, in principle, perform quantum computation by braiding the quasiparticles. As an example of the presented general discussion, the particle spectrum and the fusion algebra of an anyon model based on the gauge group S_3 are explicitly derived. The fusion algebra is found to branch into multiple proper subalgebras and the simplest one of them is chosen as a model for an illustrative demonstration. The different steps of a topological quantum computation are outlined and the computational power of the model is assessed. It turns out that the chosen model is not universal for quantum computation. However, because the objective was a demonstration of the theory with explicit calculations, none of the other more complicated fusion subalgebras were considered. Studying their applicability for quantum computation could be a topic of further research.
Resumo:
Digital elevation models (DEMs) have been an important topic in geography and surveying sciences for decades due to their geomorphological importance as the reference surface for gravita-tion-driven material flow, as well as the wide range of uses and applications. When DEM is used in terrain analysis, for example in automatic drainage basin delineation, errors of the model collect in the analysis results. Investigation of this phenomenon is known as error propagation analysis, which has a direct influence on the decision-making process based on interpretations and applications of terrain analysis. Additionally, it may have an indirect influence on data acquisition and the DEM generation. The focus of the thesis was on the fine toposcale DEMs, which are typically represented in a 5-50m grid and used in the application scale 1:10 000-1:50 000. The thesis presents a three-step framework for investigating error propagation in DEM-based terrain analysis. The framework includes methods for visualising the morphological gross errors of DEMs, exploring the statistical and spatial characteristics of the DEM error, making analytical and simulation-based error propagation analysis and interpreting the error propagation analysis results. The DEM error model was built using geostatistical methods. The results show that appropriate and exhaustive reporting of various aspects of fine toposcale DEM error is a complex task. This is due to the high number of outliers in the error distribution and morphological gross errors, which are detectable with presented visualisation methods. In ad-dition, the use of global characterisation of DEM error is a gross generalisation of reality due to the small extent of the areas in which the decision of stationarity is not violated. This was shown using exhaustive high-quality reference DEM based on airborne laser scanning and local semivariogram analysis. The error propagation analysis revealed that, as expected, an increase in the DEM vertical error will increase the error in surface derivatives. However, contrary to expectations, the spatial au-tocorrelation of the model appears to have varying effects on the error propagation analysis depend-ing on the application. The use of a spatially uncorrelated DEM error model has been considered as a 'worst-case scenario', but this opinion is now challenged because none of the DEM derivatives investigated in the study had maximum variation with spatially uncorrelated random error. Sig-nificant performance improvement was achieved in simulation-based error propagation analysis by applying process convolution in generating realisations of the DEM error model. In addition, typology of uncertainty in drainage basin delineations is presented.