11 resultados para FUNCTIONAL DATA ANALYSIS

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Filamentous fungi of the subphylum Pezizomycotina are well known as protein and secondary metabolite producers. Various industries take advantage of these capabilities. However, the molecular biology of yeasts, i.e. Saccharomycotina and especially that of Saccharomyces cerevisiae, the baker's yeast, is much better known. In an effort to explain fungal phenotypes through their genotypes we have compared protein coding gene contents of Pezizomycotina and Saccharomycotina. Only biomass degradation and secondary metabolism related protein families seem to have expanded recently in Pezizomycotina. Of the protein families clearly diverged between Pezizomycotina and Saccharomycotina, those related to mitochondrial functions emerge as the most prominent. However, the primary metabolism as described in S. cerevisiae is largely conserved in all fungi. Apart from the known secondary metabolism, Pezizomycotina have pathways that could link secondary metabolism to primary metabolism and a wealth of undescribed enzymes. Previous studies of individual Pezizomycotina genomes have shown that regardless of the difference in production efficiency and diversity of secreted proteins, the content of the known secretion machinery genes in Pezizomycotina and Saccharomycotina appears very similar. Genome wide analysis of gene products is therefore needed to better understand the efficient secretion of Pezizomycotina. We have developed methods applicable to transcriptome analysis of non-sequenced organisms. TRAC (Transcriptional profiling with the aid of affinity capture) has been previously developed at VTT for fast, focused transcription analysis. We introduce a version of TRAC that allows more powerful signal amplification and multiplexing. We also present computational optimisations of transcriptome analysis of non-sequenced organism and TRAC analysis in general. Trichoderma reesei is one of the most commonly used Pezizomycotina in the protein production industry. In order to understand its secretion system better and find clues for improvement of its industrial performance, we have analysed its transcriptomic response to protein secretion stress conditions. In comparison to S. cerevisiae, the response of T. reesei appears different, but still impacts on the same cellular functions. We also discovered in T. reesei interesting similarities to mammalian protein secretion stress response. Together these findings highlight targets for more detailed studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work belongs to the field of computational high-energy physics (HEP). The key methods used in this thesis work to meet the challenges raised by the Large Hadron Collider (LHC) era experiments are object-orientation with software engineering, Monte Carlo simulation, the computer technology of clusters, and artificial neural networks. The first aspect discussed is the development of hadronic cascade models, used for the accurate simulation of medium-energy hadron-nucleus reactions, up to 10 GeV. These models are typically needed in hadronic calorimeter studies and in the estimation of radiation backgrounds. Various applications outside HEP include the medical field (such as hadron treatment simulations), space science (satellite shielding), and nuclear physics (spallation studies). Validation results are presented for several significant improvements released in Geant4 simulation tool, and the significance of the new models for computing in the Large Hadron Collider era is estimated. In particular, we estimate the ability of the Bertini cascade to simulate Compact Muon Solenoid (CMS) hadron calorimeter HCAL. LHC test beam activity has a tightly coupled cycle of simulation-to-data analysis. Typically, a Geant4 computer experiment is used to understand test beam measurements. Thus an another aspect of this thesis is a description of studies related to developing new CMS H2 test beam data analysis tools and performing data analysis on the basis of CMS Monte Carlo events. These events have been simulated in detail using Geant4 physics models, full CMS detector description, and event reconstruction. Using the ROOT data analysis framework we have developed an offline ANN-based approach to tag b-jets associated with heavy neutral Higgs particles, and we show that this kind of NN methodology can be successfully used to separate the Higgs signal from the background in the CMS experiment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aneuploidy is among the most obvious differences between normal and cancer cells. However, mechanisms contributing to development and maintenance of aneuploid cell growth are diverse and incompletely understood. Functional genomics analyses have shown that aneuploidy in cancer cells is correlated with diffuse gene expression signatures and that aneuploidy can arise by a variety of mechanisms, including cytokinesis failures, DNA endoreplication and possibly through polyploid intermediate states. Here, we used a novel cell spot microarray technique to identify genes with a loss-of-function effect inducing polyploidy and/or allowing maintenance of polyploid cell growth of breast cancer cells. Integrative genomics profiling of candidate genes highlighted GINS2 as a potential oncogene frequently overexpressed in clinical breast cancers as well as in several other cancer types. Multivariate analysis indicated GINS2 to be an independent prognostic factor for breast cancer outcome (p = 0.001). Suppression of GINS2 expression effectively inhibited breast cancer cell growth and induced polyploidy. In addition, protein level detection of nuclear GINS2 accurately distinguished actively proliferating cancer cells suggesting potential use as an operational biomarker.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims: Develop and validate tools to estimate residual noise covariance in Planck frequency maps. Quantify signal error effects and compare different techniques to produce low-resolution maps. Methods: We derive analytical estimates of covariance of the residual noise contained in low-resolution maps produced using a number of map-making approaches. We test these analytical predictions using Monte Carlo simulations and their impact on angular power spectrum estimation. We use simulations to quantify the level of signal errors incurred in different resolution downgrading schemes considered in this work. Results: We find an excellent agreement between the optimal residual noise covariance matrices and Monte Carlo noise maps. For destriping map-makers, the extent of agreement is dictated by the knee frequency of the correlated noise component and the chosen baseline offset length. The significance of signal striping is shown to be insignificant when properly dealt with. In map resolution downgrading, we find that a carefully selected window function is required to reduce aliasing to the sub-percent level at multipoles, ell > 2Nside, where Nside is the HEALPix resolution parameter. We show that sufficient characterization of the residual noise is unavoidable if one is to draw reliable contraints on large scale anisotropy. Conclusions: We have described how to compute the low-resolution maps, with a controlled sky signal level, and a reliable estimate of covariance of the residual noise. We have also presented a method to smooth the residual noise covariance matrices to describe the noise correlations in smoothed, bandwidth limited maps.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of the present study was to investigate the possibilities and interconnec-tions that exist concerning the relationship between the University of Applied Sci-ences and the Learning by Developing action model (LbD), on the one hand, and education for sustainable development and high-quality learning as a part of profes-sional competence development on the other. The research and learning environment was the Coping at Home research project and its Caring TV project, which provided the context of the Physiotherapy for Elderly People professional study unit. The re-searcher was a teacher and an evaluator of her own students learning. The aims of the study were to monitor and evaluate learning at the individual and group level using tools of high-quality learning − improved concept maps − related to understanding the projects core concept of successful ageing. Conceptions were evaluated through aspects of sustainable development and a conceptual basis of physiotherapy. As edu-cational research this was a multi-method case study design experiment. The three research questions were as follows. 1. What kind of individual conceptions and conceptual structures do students build concerning the concept of successful ageing? How many and what kind of concepts and propositions do they have a) before the study unit, b) after the study unit, c) after the social-knowledge building? 2. What kind of social-knowledge building exists? a) What kind of social learn-ing process exists? b) What kind of socially created concepts, propositions and conceptual structures do the students possess after the project? c) What kind of meaning does the social-knowledge building have at an individual level? 3. How do physiotherapy competences develop according to the results of the first and second research questions? The subjects were 22 female, third-year Bachelor of Physiotherapy students in Laurea University of Applied Sciences in Finland. Individual learning was evaluated in 12 of the 22 students. The data was collected as a part of the learning exercises of the Physiotherapy for Elderly People study unit, with improved concept maps both at individual and group levels. The students were divided into two social-knowledge building groups: the first group had 15 members and second 7 members. Each group created a group-level concept map on the theme of successful ageing. These face-to-face interactions were recorded with CMapTools and videotaped. The data consists of both individually produced concept maps and group-produced concept maps of the two groups and the videotaped material of these processes. The data analysis was carried out at the intersection of various research traditions. Individually produced data was analysed based on content analysis. Group-produced data was analysed based on content analysis and dialogue analysis. The data was also analysed by simple statistical analysis. In the individually produced improved concept maps the students conceptions were comprehensive, and the first concept maps were found to have many concepts unrelated to each other. The conceptual structures were between spoke structures and chain structures. Only a few professional concepts were evident. In the second indi-vidual improved concept maps the conception was more professional than earlier, particulary from the functional point of view. The conceptual structures mostly re-sembled spoke structures. After the second individual concept mapping social map-ping interventions were made in the two groups. After this, multidisciplinary concrete links were established between all concepts in almost all individual concept maps, and the interconnectedness of the concepts in different subject areas was thus understood. The conceptual structures were mainly net structures. The concepts in these individual concept maps were also found to be more professional and concrete than in the previ-ous concept maps of these subjects. In addition, the wider context dependency of the concepts was recognized in many individual concept maps. This implies a conceptual framework for specialists. The social-knowledge building was similar to a social learning process. Both socio-cultural processes and cognitive processes were found to develop students conceptual awareness and the ability to engage in intentional learning. In the knowl-edge-building process two aspects were found: knowledge creation and pedagogical action. The discussion during the concept-mapping process was similar to a shared thinking process. In visualising the process with CMapTools, students easily comple-mented each others thoughts and words, as if mutually telepathic . Synthesizing, supporting, asking and answering, peer teaching and counselling, tutoring, evaluating and arguing took place, and students were very active, self-directed and creative. It took hundreds of conversations before a common understanding could be found. The use of concept mapping in particular was very effective. The concepts in these group-produced concept maps were found to be professional, and values of sustainable development were observed. The results show the importance of developing the contents and objectives of the European Qualification Framework as well as education for sustainable development, especially in terms of the need for knowledge creation, global responsibility and systemic, holistic and critical thinking in order to develop clinical practice. Keywords: education for sustainable development, learning, knowledge building, improved concept map, conceptual structure, competence, successful ageing

Relevância:

90.00% 90.00%

Publicador:

Resumo:

During the past ten years, large-scale transcript analysis using microarrays has become a powerful tool to identify and predict functions for new genes. It allows simultaneous monitoring of the expression of thousands of genes and has become a routinely used tool in laboratories worldwide. Microarray analysis will, together with other functional genomics tools, take us closer to understanding the functions of all genes in genomes of living organisms. Flower development is a genetically regulated process which has mostly been studied in the traditional model species Arabidopsis thaliana, Antirrhinum majus and Petunia hybrida. The molecular mechanisms behind flower development in them are partly applicable in other plant systems. However, not all biological phenomena can be approached with just a few model systems. In order to understand and apply the knowledge to ecologically and economically important plants, other species also need to be studied. Sequencing of 17 000 ESTs from nine different cDNA libraries of the ornamental plant Gerbera hybrida made it possible to construct a cDNA microarray with 9000 probes. The probes of the microarray represent all different ESTs in the database. From the gerbera ESTs 20% were unique to gerbera while 373 were specific to the Asteraceae family of flowering plants. Gerbera has composite inflorescences with three different types of flowers that vary from each other morphologically. The marginal ray flowers are large, often pigmented and female, while the central disc flowers are smaller and more radially symmetrical perfect flowers. Intermediate trans flowers are similar to ray flowers but smaller in size. This feature together with the molecular tools applied to gerbera, make gerbera a unique system in comparison to the common model plants with only a single kind of flowers in their inflorescence. In the first part of this thesis, conditions for gerbera microarray analysis were optimised including experimental design, sample preparation and hybridization, as well as data analysis and verification. Moreover, in the first study, the flower and flower organ-specific genes were identified. After the reliability and reproducibility of the method were confirmed, the microarrays were utilized to investigate transcriptional differences between ray and disc flowers. This study revealed novel information about the morphological development as well as the transcriptional regulation of early stages of development in various flower types of gerbera. The most interesting finding was differential expression of MADS-box genes, suggesting the existence of flower type-specific regulatory complexes in the specification of different types of flowers. The gerbera microarray was further used to profile changes in expression during petal development. Gerbera ray flower petals are large, which makes them an ideal model to study organogenesis. Six different stages were compared and specifically analysed. Expression profiles of genes related to cell structure and growth implied that during stage two, cells divide, a process which is marked by expression of histones, cyclins and tubulins. Stage 4 was found to be a transition stage between cell division and expansion and by stage 6 cells had stopped division and instead underwent expansion. Interestingly, at the last analysed stage, stage 9, when cells did not grow any more, the highest number of upregulated genes was detected. The gerbera microarray is a fully-functioning tool for large-scale studies of flower development and correlation with real-time RT-PCR results show that it is also highly sensitive and reliable. Gene expression data presented here will be a source for gene expression mining or marker gene discovery in the future studies that will be performed in the Gerbera Laboratory. The publicly available data will also serve the plant research community world-wide.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.