17 resultados para Compositional data analysis-roots in geosciences
em Helda - Digital Repository of University of Helsinki
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.
Resumo:
This work belongs to the field of computational high-energy physics (HEP). The key methods used in this thesis work to meet the challenges raised by the Large Hadron Collider (LHC) era experiments are object-orientation with software engineering, Monte Carlo simulation, the computer technology of clusters, and artificial neural networks. The first aspect discussed is the development of hadronic cascade models, used for the accurate simulation of medium-energy hadron-nucleus reactions, up to 10 GeV. These models are typically needed in hadronic calorimeter studies and in the estimation of radiation backgrounds. Various applications outside HEP include the medical field (such as hadron treatment simulations), space science (satellite shielding), and nuclear physics (spallation studies). Validation results are presented for several significant improvements released in Geant4 simulation tool, and the significance of the new models for computing in the Large Hadron Collider era is estimated. In particular, we estimate the ability of the Bertini cascade to simulate Compact Muon Solenoid (CMS) hadron calorimeter HCAL. LHC test beam activity has a tightly coupled cycle of simulation-to-data analysis. Typically, a Geant4 computer experiment is used to understand test beam measurements. Thus an another aspect of this thesis is a description of studies related to developing new CMS H2 test beam data analysis tools and performing data analysis on the basis of CMS Monte Carlo events. These events have been simulated in detail using Geant4 physics models, full CMS detector description, and event reconstruction. Using the ROOT data analysis framework we have developed an offline ANN-based approach to tag b-jets associated with heavy neutral Higgs particles, and we show that this kind of NN methodology can be successfully used to separate the Higgs signal from the background in the CMS experiment.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.
Resumo:
Aims: Develop and validate tools to estimate residual noise covariance in Planck frequency maps. Quantify signal error effects and compare different techniques to produce low-resolution maps. Methods: We derive analytical estimates of covariance of the residual noise contained in low-resolution maps produced using a number of map-making approaches. We test these analytical predictions using Monte Carlo simulations and their impact on angular power spectrum estimation. We use simulations to quantify the level of signal errors incurred in different resolution downgrading schemes considered in this work. Results: We find an excellent agreement between the optimal residual noise covariance matrices and Monte Carlo noise maps. For destriping map-makers, the extent of agreement is dictated by the knee frequency of the correlated noise component and the chosen baseline offset length. The significance of signal striping is shown to be insignificant when properly dealt with. In map resolution downgrading, we find that a carefully selected window function is required to reduce aliasing to the sub-percent level at multipoles, ell > 2Nside, where Nside is the HEALPix resolution parameter. We show that sufficient characterization of the residual noise is unavoidable if one is to draw reliable contraints on large scale anisotropy. Conclusions: We have described how to compute the low-resolution maps, with a controlled sky signal level, and a reliable estimate of covariance of the residual noise. We have also presented a method to smooth the residual noise covariance matrices to describe the noise correlations in smoothed, bandwidth limited maps.
Resumo:
The dissertation examines the foreign policies of the United States through the prism of science and technology. In the focal point of scrutiny is the policy establishing the International Institute for Applied Systems Analysis (IIASA) and the development of the multilateral part of bridge building in American foreign policy during the 1960s and early 1970s. After a long and arduous negotiation process, the institute was finally established by twelve national member organizations from the following countries: Bulgaria, Canada, Czechoslovakia, Federal Republic of Germany (FRG), France, German Democratic Republic (GDR), Great Britain, Italy, Japan, Poland, Soviet Union and United States; a few years later Sweden, Finland and the Netherlands also joined. It is said that the goal of the institute was to bring together researchers from East and West to solve pertinent problems caused by the modernization process experienced in industrialized world. It originates from President Lyndon B. Johnson s bridge building policies that were launched in 1964, and was set in a well-contested and crowded domain of other international organizations of environmental and social planning. Since the distinct need for yet another organization was not evident, the process of negotiations in this multinational environment enlightens the foreign policy ambitions of the United States on the road to the Cold War détente. The study places this project within its political era, and juxtaposes it with other international organizations, especially that of the OECD, ECE and NATO. Conventionally, Lyndon Johnson s bridge building policies have been seen as a means to normalize its international relations bilaterally with different East European countries, and the multilateral dimension of the policy has been ignored. This is why IIASA s establishment process in this multilateral environment brings forth new information on US foreign policy goals, the means to achieve these goals, as well as its relations to other advanced industrialized societies before the time of détente, during the 1960s and early 1970s. Furthermore, the substance of the institute applied systems analysis illuminates the differences between European and American methodological thinking in social planning. Systems analysis is closely associated with (American) science and technology policies of the 1960s, especially in its military administrative applications, thus analysis within the foreign policy environment of the United States proved particularly fruitful. In the 1960s the institutional structures of European continent with faltering, and the growing tendencies of integration were in flux. One example of this was the long, drawn-out process of British membership in the EEC, another is de Gaulle s withdrawal from NATO s military-political cooperation. On the other hand, however, economic cooperation in Europe between East and West, and especially with the Soviet Union was expanding rapidly. This American initiative to form a new institutional actor has to be seen in that structural context, showing that bridge building was needed not only to the East, but also to the West. The narrative amounts to an analysis of how the United States managed both cooperation and conflict in its hegemonic aspirations in the emerging modern world, and how it used its special relationship with the United Kingdom to achieve its goals. The research is based on the archives of the United States, Great Britain, Sweden, Finland, and IIASA. The primary sources have been complemented with both contemporary and present day research literature, periodicals, and interviews.
Resumo:
To test the reliability of the radiocarbon method for determining root age, we analyzed fine roots (originating from the years 1985 to 1993) from ingrowth cores with known maximum root age (1 to 6 years old). For this purpose, three Scots pine (Pinus sylvestris L.) stands were selected from boreal forests in Finland. We analyzed root 14C age by the radiocarbon method and compared it with the above-mentioned known maximum fine root age. In general, ages determined by the two methods (root 14C age and ingrowth core root maximum age) were in agreement with each other for roots of small diameter (<0.5mm). By contrast, in most of the samples of fine roots of larger diameter (1.5-2mm), the 14C age of root samples of 1987-89 exceeded the ingrowth core root maximum age by 1-10 years. This shows that these roots had received a large amount of older stored carbon from unknown sources in addition to atmospheric CO2 directly from photosynthesis. We conclude that the 14C signature of fine roots, especially those of larger diameter, may not always be indicative of root age, and that further studies are needed concerning the extent of possible root uptake of older carbon and its residence time in roots. Keywords: fine root age, Pinus sylvestris, radiocarbon, root carbon, ingrowth cores, tree ring
Resumo:
The aim was to analyse the growth and compositional development of the receptive and expressive lexicons between the ages 0,9 and 2;0 in the full-term (FT) and the very-low-birth-weight (VLBW) children who are acquiring Finnish. The associations between the expressive lexicon and grammar at 1;6 and 2;0 in the FT children were also studied. In addition, the language skills of the VLBW children at 2;0 were analysed, as well as the predictive value of early lexicon to the later language performance. Four groups took part in the studies: the longitudinal (N = 35) and cross-sectional (N = 146) samples of the FT children, and the longitudinal (N = 32) and cross-sectional (N = 66) samples of VLBW children. The data was gathered by applying of the structured parental rating method (the Finnish version of the Communicative Development Inventory), through analysis of the children´s spontaneous speech and by administering a a formal test (Reynell Developmental Language Scales). The FT children acquired their receptive lexicons earlier, at a faster rate and with larger individual variation than their expressive lexicons. The acquisition rate of the expressive lexicon increased from slow to faster in most children (91%). Highly parallel developmental paths for lexical semantic categories were detected in the receptive and expressive lexicons of the Finnish children when they were analysed in relation to the growth of the lexicon size, as described in the literature for children acquiring other languages. The emergence of grammar was closely associated with expressive lexical growth. The VLBW children acquired their receptive lexicons at a slower rate and had weaker language skills at 2;0 than the full-term children. The compositional development of both lexicons happened at a slower rate in the VLBW children when compared to the FT controls. However, when the compositional development was analysed in relation to the growth of lexicon size, this development occurred qualitatively in a nearly parallel manner in the VLBW children as in the FT children. Early receptive and expressive lexicon sizes were significantly associated with later language skills in both groups. The effect of the background variables (gender, length of the mother s basic education, birth weight) on the language development in the FT and the VLBW children differed. The results provide new information of early language acquisition by the Finnish FT and VLBW children. The results support the view that the early acquisition of the semantic lexical categories is related to lexicon growth. The current findings also propose that the early grammatical acquisition is closely related to the growth of expressive vocabulary size. The language development of the VLBW children should be followed in clinical work.
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
Nitrogen (N) and phosphorus (P) are essential elements for all living organisms. However, in excess, they contribute to several environmental problems such as aquatic and terrestrial eutrophication. Globally, human action has multiplied the volume of N and P cycling since the onset of industrialization. The multiplication is a result of intensified agriculture, increased energy consumption and population growth. Industrial ecology (IE) is a discipline, in which human interaction with the ecosystems is investigated using a systems analytical approach. The main idea behind IE is that industrial systems resemble ecosystems, and, like them, industrial systems can then be described using material, energy and information flows and stocks. Industrial systems are dependent on the resources provided by the biosphere, and these two cannot be separated from each other. When studying substance flows, the aims of the research from the viewpoint of IE can be, for instance, to elucidate the ways how the cycles of a certain substance could be more closed and how the flows of a certain substance could be decreased per unit of production (= dematerialization). In Finland, N and P are studied widely in different ecosystems and environmental emissions. A holistic picture comparing different societal systems is, however, lacking. In this thesis, flows of N and P were examined in Finland using substance flow analysis (SFA) in the following four subsystems: I) forest industry and use of wood fuels, II) food production and consumption, III) energy, and IV) municipal waste. A detailed analysis at the end of the 1990s was performed. Furthermore, historical development of the N and P flows was investigated in the energy system (III) and the municipal waste system (IV). The main research sources were official statistics, literature, monitoring data, and expert knowledge. The aim was to identify and quantify the main flows of N and P in Finland in the four subsystems studied. Furthermore, the aim was to elucidate whether the nutrient systems are cyclic or linear, and to identify how these systems could be more efficient in the use and cycling of N and P. A final aim was to discuss how this type of an analysis can be used to support decision-making on environmental problems and solutions. Of the four subsystems, the food production and consumption system and the energy system created the largest N flows in Finland. For the creation of P flows, the food production and consumption system (Paper II) was clearly the largest, followed by the forest industry and use of wood fuels and the energy system. The contribution of Finland to N and P flows on a global scale is low, but when compared on a per capita basis, we are one of the largest producers of these flows, with relatively high energy and meat consumption being the main reasons. Analysis revealed the openness of all four systems. The openness is due to the high degree of internationality of the Finnish markets, the large-scale use of synthetic fertilizers and energy resources and the low recycling rate of many waste fractions. Reduction in the use of fuels and synthetic fertilizers, reorganization of the structure of energy production, reduced human intake of nutrients and technological development are crucial in diminishing the N and P flows. To enhance nutrient recycling and replace inorganic fertilizers, recycling of such wastes as wood ash and sludge could be promoted. SFA is not usually sufficiently detailed to allow specific recommendations for decision-making to be made, but it does yield useful information about the relative magnitude of the flows and may reveal unexpected losses. Sustainable development is a widely accepted target for all human action. SFA is one method that can help to analyse how effective different efforts are in leading to a more sustainable society. SFA's strength is that it allows a holistic picture of different natural and societal systems to be drawn. Furthermore, when the environmental impact of a certain flow is known, the method can be used to prioritize environmental policy efforts.