12 resultados para data gathering algorithm

em Helda - Digital Repository of University of Helsinki


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ubiquitous computing is about making computers and computerized artefacts a pervasive part of our everyday lifes, bringing more and more activities into the realm of information. The computationalization, informationalization of everyday activities increases not only our reach, efficiency and capabilities but also the amount and kinds of data gathered about us and our activities. In this thesis, I explore how information systems can be constructed so that they handle this personal data in a reasonable manner. The thesis provides two kinds of results: on one hand, tools and methods for both the construction as well as the evaluation of ubiquitous and mobile systems---on the other hand an evaluation of the privacy aspects of a ubiquitous social awareness system. The work emphasises real-world experiments as the most important way to study privacy. Additionally, the state of current information systems as regards data protection is studied. The tools and methods in this thesis consist of three distinct contributions. An algorithm for locationing in cellular networks is proposed that does not require the location information to be revealed beyond the user's terminal. A prototyping platform for the creation of context-aware ubiquitous applications called ContextPhone is described and released as open source. Finally, a set of methodological findings for the use of smartphones in social scientific field research is reported. A central contribution of this thesis are the pragmatic tools that allow other researchers to carry out experiments. The evaluation of the ubiquitous social awareness application ContextContacts covers both the usage of the system in general as well as an analysis of privacy implications. The usage of the system is analyzed in the light of how users make inferences of others based on real-time contextual cues mediated by the system, based on several long-term field studies. The analysis of privacy implications draws together the social psychological theory of self-presentation and research in privacy for ubiquitous computing, deriving a set of design guidelines for such systems. The main findings from these studies can be summarized as follows: The fact that ubiquitous computing systems gather more data about users can be used to not only study the use of such systems in an effort to create better systems but in general to study phenomena previously unstudied, such as the dynamic change of social networks. Systems that let people create new ways of presenting themselves to others can be fun for the users---but the self-presentation requires several thoughtful design decisions that allow the manipulation of the image mediated by the system. Finally, the growing amount of computational resources available to the users can be used to allow them to use the data themselves, rather than just being passive subjects of data gathering.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In lake-rich regions, the gathering of information about water quality is challenging because only a small proportion of the lakes can be assessed each year by conventional methods. One of the techniques for improving the spatial and temporal representativeness of lake monitoring is remote sensing from satellites and aircrafts. The experimental material included detailed optical measurements in 11 lakes, air- and spaceborne remote sensing measurements with concurrent field sampling, automatic raft measurements and a national dataset of routine water quality measurements from over 1100 lakes. The analyses of the spatially high-resolution airborne remote sensing data from eutrophic and mesotrophic lakes showed that one or a few discrete water quality observations using conventional monitoring can yield a clear over- or underestimation of the overall water quality in a lake. The use of TM-type satellite instruments in addition to routine monitoring results substantially increases the number of lakes for which water quality information can be obtained. The preliminary results indicated that coloured dissolved organic matter (CDOM) can be estimated with TM-type satellite instruments, which could possibly be utilised as an aid in estimating the role of lakes in global carbon budgets. Based on the results of reflectance modelling and experimental data, MERIS satellite instrument has optimal or near-optimal channels for the estimation of turbidity, chlorophyll a and CDOM in Finnish lakes. MERIS images with a 300 m spatial resolution can provide water quality information in different parts of large and medium-sized lakes, and in filling in the gaps resulting from conventional monitoring. Algorithms that would not require simultaneous field data for algorithm training would increase the amount of remote sensing-based information available for lake monitoring. The MERIS Boreal Lakes processor, trained with the optical data and concentration ranges provided by this study, enabled turbidity estimations with good accuracy without the need for algorithm correction with field measurements, while chlorophyll a and CDOM estimations require further development of the processor. The accuracy of interpreting chlorophyll a via semi empirical algorithms can be improved by classifying lakes prior to interpretation according to their CDOM level and trophic status. Optical modelling indicated that the spectral diffuse attenuation coefficient can be estimated with reasonable accuracy from the measured water quality concentrations. This provides more detailed information on light attenuation from routine monitoring measurements than is available through the Secchi disk transparency. The results of this study improve the interpretation of lake water quality by remote sensing and encourage the use of remote sensing in lake monitoring.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This research examines three aspects of becoming a teacher, teacher identity formation in mathematics teacher education: the cognitive and affective aspect, the image of an ideal teacher directing the developmental process, and as an on-going process. The formation of emerging teacher identity was approached in a social psychological framework, in which individual development takes place in social interaction with the context through various experiences. Formation of teacher identity is seen as a dynamic, on-going developmental process, in which an individual intentionally aspires after the ideal image of being a teacher by developing his/her own competence as a teacher. The starting-point was that it is possible to examine formation of teacher identity through conceptualisation of observations that the individual and others have about teacher identity in different situations. The research uses the qualitative case study approach to formation of emerging teacher identity, the individual developmental process and the socially constructed image of an ideal mathematics teacher. Two student cases, John and Mary, and the collective case of teacher educators representing socially shared views of becoming and being a mathematics teacher are presented. The development of each student was examined based on three semi-structured interviews supplemented with written products. The data-gathering took place during the 2005 2006 academic year. The collective case about the ideal image provided during the programme was composed of separate case displays of each teacher educator, which were mainly based on semi-structured interviews in spring term 2006. The intentions and aims set for students were of special interest in the interviews with teacher educators. The interview data was analysed following the modified idea of analytic induction. The formation of teacher identity is elaborated through three themes emerging from theoretical considerations and the cases. First, the profile of one s present state as a teacher may be scrutinised through separate affective and cognitive aspects associated with the teaching profession. The differences between individuals arise through dif-ferent emphasis on these aspects. Similarly, the socially constructed image of an ideal teacher may be profiled through a combination of aspects associated with the teaching profession. Second, the ideal image directing the individual developmental process is the level at which individual and social processes meet. Third, formation of teacher identity is about becoming a teacher both in the eyes of the individual self as well as of others in the context. It is a challenge in academic mathematics teacher education to support the various cognitive and affective aspects associated with being a teacher in a way that being a professional and further development could have a coherent starting-point that an individual can internalise.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This report derives from the EU funded research project “Key Factors Influencing Economic Relationships and Communication in European Food Chains” (FOODCOMM). The research consortium consisted of the following organisations: University of Bonn (UNI BONN), Department of Agricultural and Food Marketing Research (overall project co-ordination); Institute of Agricultural Development in Central and Eastern Europe (IAMO), Department for Agricultural Markets, Marketing and World Agricultural Trade, Halle (Saale), Germany; University of Helsinki, Ruralia Institute Seinäjoki Unit, Finland; Scottish Agricultural College (SAC), Food Marketing Research Team - Land Economy Research Group, Edinburgh and Aberdeen; Ashtown Food Research Centre (AFRC), Teagasc, Food Marketing Unit, Dublin; Institute of Agricultural & Food Economics (IAFE), Department of Market Analysis and Food Processing, Warsaw and Government of Aragon, Center for Agro-Food Research and Technology (CITA), Zaragoza, Spain. The aim of the FOODCOMM project was to examine the role (prevalence, necessity and significance) of economic relationships in selected European food chains and to identify the economic, social and cultural factors which influence co-ordination within these chains. The research project considered meat and cereal commodities in six different European countries (Finland, Germany, Ireland, Poland, Spain, UK/Scotland) and was commissioned against a background of changing European food markets. The research project as a whole consisted of seven different work packages. This report presents the results of qualitative research conducted for work package 5 (WP5) in the pig meat and rye bread chains in Finland. Ruralia Institute would like to give special thanks for all the individuals and companies that kindly gave up their time to take part in the study. Their input has been invaluable to the project. The contribution of research assistant Sanna-Helena Rantala was significant in the data gathering. FOODCOMM project was coordinated by the University of Bonn, Department of Agricultural and Food Market Research. Special thanks especially to Professor Monika Hartmann for acting as the project leader of FOODCOMM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The history of software development in a somewhat systematical way has been performed for half a century. Despite this time period, serious failures in software development projects still occur. The pertinent mission of software project management is to continuously achieve more and more successful projects. The application of agile software methods and more recently the integration of Lean practices contribute to this trend of continuous improvement in the software industry. One such area warranting proper empirical evidence is the operational efficiency of projects. In the field of software development, Kanban as a process management method has gained momentum recently, mostly due to its linkages to Lean thinking. However, only a few empirical studies investigate the impacts of Kanban on projects in that particular area. The aim of this doctoral thesis is to improve the understanding of how Kanban impacts on software projects. The research is carried out in the area of Lean thinking, which contains a variety of concepts including Kanban. This article-type thesis conducts a set of case studies expanded with the research strategy of quasi-controlled experiment. The data-gathering techniques of interviews, questionnaires, and different types of observations are used to study the case projects, and thereby to understand the impacts of Kanban on software development projects. The research papers of the thesis are refereed, international journal and conference publications. The results highlight new findings regarding the application of Kanban in the software context. The key findings of the thesis suggest that Kanban is applicable to software development. Despite its several benefits reported in this thesis, the empirical evidence implies that Kanban is not all-encompassing but requires additional practices to keep development projects performing appropriately. Implications for research are given, as well. In addition to these findings, the thesis contributes in the area of plan-driven software development by suggesting implications both for research and practitioners. As a conclusion, Kanban can benefit software development projects but additional practices would increase its potential for the projects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cell transition data is obtained from a cellular phone that switches its current serving cell tower. The data consists of a sequence of transition events, which are pairs of cell identifiers and transition times. The focus of this thesis is applying data mining methods to such data, developing new algorithms, and extracting knowledge that will be a solid foundation on which to build location-aware applications. In addition to a thorough exploration of the features of the data, the tools and methods developed in this thesis provide solutions to three distinct research problems. First, we develop clustering algorithms that produce a reliable mapping between cell transitions and physical locations observed by users of mobile devices. The main clustering algorithm operates in online fashion, and we consider also a number of offline clustering methods for comparison. Second, we define the concept of significant locations, known as bases, and give an online algorithm for determining them. Finally, we consider the task of predicting the movement of the user, based on historical data. We develop a prediction algorithm that considers paths of movement in their entirety, instead of just the most recent movement history. All of the presented methods are evaluated with a significant body of real cell transition data, collected from about one hundred different individuals. The algorithms developed in this thesis are designed to be implemented on a mobile device, and require no extra hardware sensors or network infrastructure. By not relying on external services and keeping the user information as much as possible on the user s own personal device, we avoid privacy issues and let the users control the disclosure of their location information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Solar UV radiation is harmful for life on planet Earth, but fortunately the atmospheric oxygen and ozone absorb almost entirely the most energetic UVC radiation photons. However, part of the UVB radiation and much of the UVA radiation reaches the surface of the Earth, and affect human health, environment, materials and drive atmospheric and aquatic photochemical processes. In order to quantify these effects and processes there is a need for ground-based UV measurements and radiative transfer modeling to estimate the amounts of UV radiation reaching the biosphere. Satellite measurements with their near-global spatial coverage and long-term data conti-nuity offer an attractive option for estimation of the surface UV radiation. This work focuses on radiative transfer theory based methods used for estimation of the UV radiation reaching the surface of the Earth. The objectives of the thesis were to implement the surface UV algorithm originally developed at NASA Goddard Space Flight Center for estimation of the surface UV irradiance from the meas-urements of the Dutch-Finnish built Ozone Monitoring Instrument (OMI), to improve the original surface UV algorithm especially in relation with snow cover, to validate the OMI-derived daily surface UV doses against ground-based measurements, and to demonstrate how the satellite-derived surface UV data can be used to study the effects of the UV radiation. The thesis consists of seven original papers and a summary. The summary includes an introduction of the OMI instrument, a review of the methods used for modeling of the surface UV using satellite data as well as the con-clusions of the main results of the original papers. The first two papers describe the algorithm used for estimation of the surface UV amounts from the OMI measurements as well as the unique Very Fast Delivery processing system developed for processing of the OMI data received at the Sodankylä satellite data centre. The third and the fourth papers present algorithm improvements related to the surface UV albedo of the snow-covered land. Fifth paper presents the results of the comparison of the OMI-derived daily erythemal doses with those calculated from the ground-based measurement data. It gives an estimate of the expected accuracy of the OMI-derived sur-face UV doses for various atmospheric and other conditions, and discusses the causes of the differences between the satellite-derived and ground-based data. The last two papers demonstrate the use of the satellite-derived sur-face UV data. Sixth paper presents an assessment of the photochemical decomposition rates in aquatic environment. Seventh paper presents use of satellite-derived daily surface UV doses for planning of the outdoor material weathering tests.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis a manifold learning method is applied to the problem of WLAN positioning and automatic radio map creation. Due to the nature of WLAN signal strength measurements, a signal map created from raw measurements results in non-linear distance relations between measurement points. These signal strength vectors reside in a high-dimensioned coordinate system. With the help of the so called Isomap-algorithm the dimensionality of this map can be reduced, and thus more easily processed. By embedding position-labeled strategic key points, we can automatically adjust the mapping to match the surveyed environment. The environment is thus learned in a semi-supervised way; gathering training points and embedding them in a two-dimensional manifold gives us a rough mapping of the measured environment. After a calibration phase, where the labeled key points in the training data are used to associate coordinates in the manifold representation with geographical locations, we can perform positioning using the adjusted map. This can be achieved through a traditional supervised learning process, which in our case is a simple nearest neighbors matching of a sampled signal strength vector. We deployed this system in two locations in the Kumpula campus in Helsinki, Finland. Results indicate that positioning based on the learned radio map can achieve good accuracy, especially in hallways or other areas in the environment where the WLAN signal is constrained by obstacles such as walls.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.