38 resultados para Methods : Data Analysis
em Helda - Digital Repository of University of Helsinki
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.
Resumo:
This work belongs to the field of computational high-energy physics (HEP). The key methods used in this thesis work to meet the challenges raised by the Large Hadron Collider (LHC) era experiments are object-orientation with software engineering, Monte Carlo simulation, the computer technology of clusters, and artificial neural networks. The first aspect discussed is the development of hadronic cascade models, used for the accurate simulation of medium-energy hadron-nucleus reactions, up to 10 GeV. These models are typically needed in hadronic calorimeter studies and in the estimation of radiation backgrounds. Various applications outside HEP include the medical field (such as hadron treatment simulations), space science (satellite shielding), and nuclear physics (spallation studies). Validation results are presented for several significant improvements released in Geant4 simulation tool, and the significance of the new models for computing in the Large Hadron Collider era is estimated. In particular, we estimate the ability of the Bertini cascade to simulate Compact Muon Solenoid (CMS) hadron calorimeter HCAL. LHC test beam activity has a tightly coupled cycle of simulation-to-data analysis. Typically, a Geant4 computer experiment is used to understand test beam measurements. Thus an another aspect of this thesis is a description of studies related to developing new CMS H2 test beam data analysis tools and performing data analysis on the basis of CMS Monte Carlo events. These events have been simulated in detail using Geant4 physics models, full CMS detector description, and event reconstruction. Using the ROOT data analysis framework we have developed an offline ANN-based approach to tag b-jets associated with heavy neutral Higgs particles, and we show that this kind of NN methodology can be successfully used to separate the Higgs signal from the background in the CMS experiment.
Resumo:
Accelerator mass spectrometry (AMS) is an ultrasensitive technique for measuring the concentration of a single isotope. The electric and magnetic fields of an electrostatic accelerator system are used to filter out other isotopes from the ion beam. The high velocity means that molecules can be destroyed and removed from the measurement background. As a result, concentrations down to one atom in 10^16 atoms are measurable. This thesis describes the construction of the new AMS system in the Accelerator Laboratory of the University of Helsinki. The system is described in detail along with the relevant ion optics. System performance and some of the 14C measurements done with the system are described. In a second part of the thesis, a novel statistical model for the analysis of AMS data is presented. Bayesian methods are used in order to make the best use of the available information. In the new model, instrumental drift is modelled with a continuous first-order autoregressive process. This enables rigorous normalization to standards measured at different times. The Poisson statistical nature of a 14C measurement is also taken into account properly, so that uncertainty estimates are much more stable. It is shown that, overall, the new model improves both the accuracy and the precision of AMS measurements. In particular, the results can be improved for samples with very low 14C concentrations or measured only a few times.
Resumo:
Aims: Develop and validate tools to estimate residual noise covariance in Planck frequency maps. Quantify signal error effects and compare different techniques to produce low-resolution maps. Methods: We derive analytical estimates of covariance of the residual noise contained in low-resolution maps produced using a number of map-making approaches. We test these analytical predictions using Monte Carlo simulations and their impact on angular power spectrum estimation. We use simulations to quantify the level of signal errors incurred in different resolution downgrading schemes considered in this work. Results: We find an excellent agreement between the optimal residual noise covariance matrices and Monte Carlo noise maps. For destriping map-makers, the extent of agreement is dictated by the knee frequency of the correlated noise component and the chosen baseline offset length. The significance of signal striping is shown to be insignificant when properly dealt with. In map resolution downgrading, we find that a carefully selected window function is required to reduce aliasing to the sub-percent level at multipoles, ell > 2Nside, where Nside is the HEALPix resolution parameter. We show that sufficient characterization of the residual noise is unavoidable if one is to draw reliable contraints on large scale anisotropy. Conclusions: We have described how to compute the low-resolution maps, with a controlled sky signal level, and a reliable estimate of covariance of the residual noise. We have also presented a method to smooth the residual noise covariance matrices to describe the noise correlations in smoothed, bandwidth limited maps.
Resumo:
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.
Resumo:
Many active pharmaceutical ingredients (APIs) have both anhydrate and hydrate forms. Due to the different physicochemical properties of solid forms, the changes in solid-state may result in therapeutic, pharmaceutical, legal and commercial problems. In order to obtain good solid dosage form quality and performance, there is a constant need to understand and control these phase transitions during manufacturing and storage. Thus it is important to detect and also quantify the possible transitions between the different forms. In recent years, vibrational spectroscopy has become an increasingly popular tool to characterise the solid-state forms and their phase transitions. It offers several advantages over other characterisation techniques including an ability to obtain molecular level information, minimal sample preparation, and the possibility of monitoring changes non-destructively in-line. Dehydration is the phase transition of hydrates which is frequently encountered during the dosage form production and storage. The aim of the present thesis was to investigate the dehydration behaviour of diverse pharmaceutical hydrates by near infrared (NIR), Raman and terahertz pulsed spectroscopic (TPS) monitoring together with multivariate data analysis. The goal was to reveal new perspectives for investigation of the dehydration at the molecular level. Solid-state transformations were monitored during dehydration of diverse hydrates on hot-stage. The results obtained from qualitative experiments were used to develop a method and perform the quantification of the solid-state forms during process induced dehydration in a fluidised bed dryer. Both in situ and in-line process monitoring and quantification was performed. This thesis demonstrated the utility of vibrational spectroscopy techniques and multivariate modelling to monitor and investigate dehydration behaviour in situ and during fluidised bed drying. All three spectroscopic methods proved complementary in the study of dehydration. NIR spectroscopy models could quantify the solid-state forms in the binary system, but were unable to quantify all the forms in the quaternary system. Raman spectroscopy models on the other hand could quantify all four solid-state forms that appeared upon isothermal dehydration. The speed of spectroscopic methods makes them applicable for monitoring dehydration and the quantification of multiple forms was performed during phase transition. Thus the solid-state structure information at the molecular level was directly obtained. TPS detected the intermolecular phonon modes and Raman spectroscopy detected mostly the changes in intramolecular vibrations. Both techniques revealed information about the crystal structure changes. NIR spectroscopy, on the other hand was more sensitive to water content and hydrogen bonding environment of water molecules. This study provides a basis for real time process monitoring using vibrational spectroscopy during pharmaceutical manufacturing.
Resumo:
The present study provides a usage-based account of how three grammatical structures, declarative content clauses, interrogative content clause and as-predicative constructions, are used in academic research articles. These structures may be used in both knowledge claims and citations, and they often express evaluative meanings. Using the methodology of quantitative corpus linguistics, I investigate how the culture of the academic discipline influences the way in which these constructions are used in research articles. The study compares the rates of occurrence of these grammatical structures and investigates their co-occurrence patterns in articles representing four different disciplines (medicine, physics, law, and literary criticism). The analysis is based on a purpose-built 2-million-word corpus, which has been part-of-speech tagged. The analysis demonstrates that the use of these grammatical structures varies between disciplines, and further shows that the differences observed in the corpus data are linked with differences in the nature of knowledge and the patterns of enquiry. The constructions in focus tend to be more frequently used in the soft disciplines, law and literary criticism, where their co-occurrence patterns are also more varied. This reflects both the greater variety of topics discussed in these disciplines, and the higher frequency of references to statements made by other researchers. Knowledge-building in the soft fields normally requires a careful contextualisation of the arguments, giving rise to statements reporting earlier research employing the constructions in focus. In contrast, knowledgebuilding in the hard fields is typically a cumulative process, based on agreed-upon methods of analysis. This characteristic is reflected in the structure and contents of research reports, which offer fewer opportunities for using these constructions.
Resumo:
This academic work begins with a compact presentation of the general background to the study, which also includes an autobiography for the interest in this research. The presentation provides readers who know little of the topic of this research and of the structure of the educational system as well as of the value given to education in Nigeria. It further concentrates on the dynamic interplay of the effect of academic and professional qualification and teachers' job effectiveness in secondary schools in Nigeria in particular, and in Africa in general. The aim of this study is to produce a systematic analysis and rich theoretical and empirical description of teachers' teaching competencies. The theoretical part comprises a comprehensive literature review that focuses on research conducted in the areas of academic and professional qualification and teachers' job effectiveness, teaching competencies, and the role of teacher education with particular emphasis on school effectiveness and improvement. This research benefits greatly from the functionalist conception of education, which is built upon two emphases: the application of the scientific method to the objective social world, and the use of an analogy between the individual 'organism' and 'society'. To this end, it offers us an opportunity to define terms systematically and to view problems as always being interrelated with other components of society. The empirical part involves describing and interpreting what educational objectives can be achieved with the help of teachers' teaching competencies in close connection to educational planning, teacher training and development, and achieving them without waste. The data used in this study were collected between 2002 and 2003 from teachers, principals, supervisors of education from the Ministry of Education and Post Primary Schools Board in the Rivers State of Nigeria (N=300). The data were collected from interviews, documents, observation, and questionnaires and were analyzed using both qualitative and quantitative methods to strengthen the validity of the findings. The data collected were analyzed to answer the specific research questions and hypotheses posited in this study. The data analysis involved the use of multiple statistical procedures: Percentages Mean Point Value, T-test of Significance, One-Way Analysis of Variance (ANOVA), and Cross Tabulation. The results obtained from the data analysis show that teachers require professional knowledge and professional teaching skills, as well as a broad base of general knowledge (e.g., morality, service, cultural capital, institutional survey). Above all, in order to carry out instructional processes effectively, teachers should be both academically and professionally trained. This study revealed that teachers are not however expected to have an extraordinary memory, but rather looked upon as persons capable of thinking in the right direction. This study may provide a solution to the problem of teacher education and school effectiveness in Nigeria. For this reason, I offer this treatise to anyone seriously committed in improving schools in developing countries in general and in Nigeria in particular to improve the lives of all its citizens. In particular, I write this to encourage educational planners, education policy makers, curriculum developers, principals, teachers, and students of education interested in empirical information and methods to conceptualize the issue this study has raised and to provide them with useful suggestions to help them improve secondary schooling in Nigeria. Though, multiple audiences exist for any text. For this reason, I trust that the academic community will find this piece of work a useful addition to the existing literature on school effectiveness and school improvement. Through integrating concepts from a number of disciplines, I aim to describe as holistic a representation as space could allow of the components of school effectiveness and quality improvement. A new perspective on teachers' professional competencies, which not only take into consideration the unique characteristics of the variables used in this study, but also recommend their environmental and cultural derivation. In addition, researchers should focus their attention on the ways in which both professional and non-professional teachers construct and apply their methodological competencies, such as their grouping procedures and behaviors to the schooling of students. Keywords: Professional Training, Academic Training, Professionally Qualified, Academically Qualified, Professional Qualification, Academic Qualification, Job Effectiveness, Job Efficiency, Educational Planning, Teacher Training and Development, Nigeria.
Resumo:
The aim of this qualitative study is to chart the operational context of the upper-secondary school principals and the historical, cultural and structural factors that steer their day-to-day work. The concepts regarding the study environment and operational culture are defined and analysed in terms of how they are interrelated. Furthermore it is explained why the upper-secondary schools must describe their operational culture within the curriculum. The study also aims to connect the description of the operational culture with the operational system of the upper-secondary school and to analyse the descriptions of the five upper-secondary schools in relation to the commitment to developing a study environment conducive to learning and participation, as well as conducive to supporting interaction. Interview data is used to provide the background for the description of the operational culture and to particularise the results of the analysis. According to the theory used in this study, the steering sources of the day-to-day work of the upper-secondary school are the rules system of the state, the municipality, the curriculum, and on the level of the upper-secondary school administration. The research data consist of the literature concerning the steering, steering forms and the principals professional picture in general terms, from 1950 to the present, and the steering texts concerning the educational environment and operational culture. Furthermore the research data include five descriptions of the operational culture concerning the upper-secondary school, the action reports and student guides. The methods of analysis include the level model and content analysis. The first is a part of the theory used in this study. For the purpose of content analysis, moreover, classifying grounds are established on the basis of theoretical and empirical research data. A result of this study is that, from the perspective of steering, the function of describing the operational culture is clearly linked to the evaluation supporting the goals and the vision of a learning organization. From an administrative point of view, a description is a problem-solving strategy and an instrument of evaluation. The study environment is a structural context in and through which the actors of the school create, change and renew the elements of the structure rooted in the context; this structure is their historically and culturally mediated way of thinking and acting. The initial situation and orientation of the students affect the emphasis of the operational culture descriptions; principals also have their own personal style of leadership. Key words: source of steering, educational environment, operational culture, self-evaluation, learning organisation
Resumo:
Objectives. In this research I analyzed the learning process of teacher students in a planning meeting using the expansive learning cycle and types of interaction approaches. In activity theory framework the expansive learning cycle has been applied widely in analyzing learning processes taking several years. However, few studies exist utilizing expansive cycles in analyzing short single meetings. In the activity theory framework talk and interaction have been analyzed using following types of interaction: coordination, cooperation and communication. In these studies single interaction situations have been analyzed, in which the status and power positions of participants has been very different. Interactions of self-directed teams, in which the participants are equal, have been examined very little. I am not aware of any studies, in which both learning actions of the expansive cycle and types of interaction by analyzing the same data would have been utilized. The aim of my study was to describe the process of collaborative innovative learning in a situation where the student group tries to accomplish a broad and ill-defined learning task. I aim to describe, how this planning process proceeds through different phases of learning actions of the expansive cycle. My goal is to understand and describe the transformations in the quality of interaction and transitions which are related to it. Another goal of this study is to specify the possible similarities and differences between expansive learning and types of interactions. Methods. Data of this study consisted of videotaped meetings, which were part of the study module for class teacher degree. The first meeting of the study module was chosen to be the primary research material. Five students were present in the group meeting. Transcription of the conversation was analyzed by classifying the turns of conversation following phases of the expansive cycle. After that the material was categorized again by using types of interaction. Results and conclusions. As a result of this study I was able to trace all the phases of the expansive cycle except one. Also, I was able to identify all interaction types. When I compared the two modes of analysis side by side I was able to find connecting main phases. Thus I was able to identify the interdependence between the two ways of analysis on a higher level, although I was not able to notice correlation on the level of individual phases. Based on this, I conclude that learning of the group was simultaneously specification and formulation of the object at the different phases of expansive learning and transformation of the quality of the interaction while searching for the common object.
Resumo:
This study examined religious home education in educational, psychological, and sociological context. Growing up within a religious denomination is a process of learning the rules, norms, opinions, and attitudes, which serve to make the individual an active member of the group. It is a process of transferring the cultural inheritance between generations. Sabbathkeeping can be regarded as a strong indicator of the Seventh-day Adventist value system, which is also why I have concentrated on this specific issue in my study. The purpose of the study was to find out, how the Sabbath is transferred from parents to children among Finnish Adventists. It was also examined how parents could make the day of rest positively exceptional for children, and how the parental authoritativeness affects the process of transference. According to Bull & Lockhart s (1989) theory, the amount of Adventist generations in family history influences the transfer of religious tradition. This study aimed to find out whether or not this theory would apply to the present-day Finland. The nature of religious development among Adventist young people was also one of the interests of the research. The methods used in the study were in-depth interviews (n = 10) and a survey (n = 106). The majority of the interviewees was young adults (age 15-30) grown up in Adventist families. The interviews were taped and transcribed for the study, and survey answers were analysed with SPSS-data analysis program. The amount of survey questionnaires evaluated was 106, whole population of 15-30 year-old Finnish Adventists being about one thousand. Democratic relationship between parents and children, parents' example, encouragement to own thinking, and positive experiences of Sabbath and the whole religion, including the social dimension of the Adventism, seem to be some of the most significant factors in transference of religious tradition. Both too severe and too permissive education were considered to lead to similar results: unsuccessful transfer of values, or even rebellion and adopting a totally opposite way of life than that of the parents. In this study the amount of Adventist generations in family history does not correlate significantly with the end results of value transference. Keywords: Sabbath, intergenerational, value transference, religious home education Avainsanat: sapatti, arvojen siirtyminen vanhemmilta lapsille, uskonnollinen kotikasvatus