27 resultados para Modeling Non-Verbal Behaviors Using Machine Learning
Resumo:
The aim of the present set of longitudinal studies was to explore 3-7-year-old children.s Spontaneous FOcusing on Numerosity (SFON) and its relation to early mathematical development. The specific goals were to capture in method and theory the distinct process by which children focus on numerosity as a part of their activities involving exact number recognition, and individual differences in this process that may be informative in the development of more complex number skills. Over the course of conducting the five studies, fifteen novel tasks were progressively developed for the SFON assessments. In the tasks, confounding effects of insufficient number recognition, verbal comprehension, other procedural skills as well as working memory capacity were aimed to be controlled. Furthermore, how children.s individual differences in SFON are related to their development of number sequence, subitizing-based enumeration, object counting and basic arithmetic skills was explored. The effect of social interaction on SFON was tested. Study I captured the first phase of the 3-year longitudinal study with 39 children. It was investigated whether there were differences in 3-year-old children.s tendency to focus on numerosity, and whether these differences were related to the children.s development of cardinality recognition skills from the age of 3 to 4 years. It was found that the two groups of children formed on the basis of their amount of SFON tendency at the age of 3 years differed in their development of recognising and producing small numbers. The children whose SFON tendency was very predominant developed faster in cardinality related skills from the age of 3 to 4 years than the children whose SFON tendency was not as predominant. Thus, children.s development in cardinality recognition skills is related to their SFON tendency. Studies II and III were conducted to investigate, firstly, children.s individual differences in SFON, and, secondly, whether children.s SFON is related to their counting development. Altogether nine tasks were designed for the assessments of spontaneous and guided focusing on numerosity. The longitudinal data of 39 children in Study II from the age of 3.5 to 6 years showed individual differences in SFON at the ages of 4, 5 and 6 years, as well as stability in children.s SFON across tasks used at different ages. The counting skills were assessed at the ages of 3.5, 5 and 6 years. Path analyses indicated a reciprocal tendency in the relationship between SFON and counting development. In Study III, these results on the individual differences in SFON tendency, the stability of SFON across different tasks and the relationship of SFON and mathematical skills were confirmed by a larger-scale cross-sectional study of 183 on average 6.5-year-old children (range 6;0-7;0 years). The significant amount of unique variance that SFON accounted for number sequence elaboration, object counting and basic arithmetic skills stayed statistically significant (partial correlations varying from .27 to .37) when the effects of non-verbal IQ and verbal comprehension were controlled. In addition, to confirm that the SFON tasks assess SFON tendency independently from enumeration skills, guided focusing tasks were used for children who had failed in SFON tasks. It was explored whether these children were able to proceed in similar tasks to SFON tasks once they were guided to focus on number. The results showed that these children.s poor performance in the SFON tasks was not caused by their deficiency in executing the tasks but on lacking focusing on numerosity. The longitudinal Study IV of 39 children aimed at increasing the knowledge of associations between children.s long-term SFON tendency, subitizing-based enumeration and verbal counting skills. Children were tested twice at the age of 4-5 years on their SFON, and once at the age of 5 on their subitizing-based enumeration, number sequence production, as well as on their skills for counting of objects. Results showed considerable stability in SFON tendency measured at different ages, and that there is a positive direct association between SFON and number sequence production. The association between SFON and object counting skills was significantly mediated by subitizing-based enumeration. These results indicate that the associations between the child.s SFON and sub-skills of verbal counting may differ on the basis of how significant a role understanding the cardinal meanings of number words plays in learning these skills. The specific goal of Study V was to investigate whether it is possible to enhance 3-year old children.s SFON tendency, and thus start children.s deliberate practice in early mathematical skills. Participants were 3-year-old children in Finnish day care. The SFON scores and cardinality-related skills of the experimental group of 17 children were compared to the corresponding results of the 17 children in the control group. The results show an experimental effect on SFON tendency and subsequent development in cardinality-related skills during the 6-month period from pretest to delayed posttest in the children with some initial SFON tendency in the experimental group. Social interaction has an effect on children.s SFON tendency. The results of the five studies assert that within a child.s existing mathematical competence, it is possible to distinguish a separate process, which refers to the child.s tendency to spontaneously focus on numerosity. Moreover, there are significant individual differences in children.s SFON at the age of 3-7 years. Moderate stability was found in this tendency across different tasks assessed both at the same and at different ages. Furthermore, SFON tendency is related to the development of early mathematical skills. Educational implications of the findings emphasise, first, the importance of regarding focusing on numerosity as a separate, essential process in the assessments of young children.s mathematical skills. Second, the substantial individual differences in SFON tendency during the childhood years suggest that uncovering and modeling this kind of mathematically meaningful perceiving of the surroundings and tasks could be an efficient tool for promoting young children.s mathematical development, and thus prevent later failures in learning mathematical skills. It is proposed to consider focusing on numerosity as one potential sub-process of activities involving exact number recognition in future studies.
Resumo:
The continuous technology evaluation is benefiting our lives to a great extent. The evolution of Internet of things and deployment of wireless sensor networks is making it possible to have more connectivity between people and devices used extensively in our daily lives. Almost every discipline of daily life including health sector, transportation, agriculture etc. is benefiting from these technologies. There is a great potential of research and refinement of health sector as the current system is very often dependent on manual evaluations conducted by the clinicians. There is no automatic system for patient health monitoring and assessment which results to incomplete and less reliable heath information. Internet of things has a great potential to benefit health care applications by automated and remote assessment, monitoring and identification of diseases. Acute pain is the main cause of people visiting to hospitals. An automatic pain detection system based on internet of things with wireless devices can make the assessment and redemption significantly more efficient. The contribution of this research work is proposing pain assessment method based on physiological parameters. The physiological parameters chosen for this study are heart rate, electrocardiography, breathing rate and galvanic skin response. As a first step, the relation between these physiological parameters and acute pain experienced by the test persons is evaluated. The electrocardiography data collected from the test persons is analyzed to extract interbeat intervals. This evaluation clearly demonstrates specific patterns and trends in these parameters as a consequence of pain. This parametric behavior is then used to assess and identify the pain intensity by implementing machine learning algorithms. Support vector machines are used for classifying these parameters influenced by different pain intensities and classification results are achieved. The classification results with good accuracy rates between two and three levels of pain intensities shows clear indication of pain and the feasibility of this pain assessment method. An improved approach on the basis of this research work can be implemented by using both physiological parameters and electromyography data of facial muscles for classification.
Resumo:
In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.
Resumo:
Työn tavoitteena oli edesauttaa Euroelektro International Oy:tä kasvattamaan asiakaskuntaansa löytämällä oikeat lähtökohdat yrityksen markkinoinnin ja myynnin tehostamiselle sekä kannattavien kohdesegmenttien valinnalle. Työssä tehtiin tutkimus, jolla määritettiin yrityksen tyypillinen asiakas, asiakastarpeet, konenäköjärjestelmien ostokriteerit ja –preferenssit sekä ostopäätöksen tekijät ja siihen vaikuttavat henkilöt. Lisäksi selvitettiin, mitkä ovat Euroelektron potentiaalisia ja ei-potentiaalisia teollisuuden aloja. Tutkimuksen tulosten perusteella laadittiin lopuksi yrityksen markkinoinnin ja myynnin kehittämisehdotelma. Tutkimus rajattiin konenäköä jo käyttäviin yrityksiin, konenäön käyttöä suunnitteleviin yrityksiin, yrityksiin, joiden ajateltiin voivan tulevaisuudessa käyttää konenäköä ja yrityksiin, jotka ovat tekemisissä konenäköasiakkaiden kanssa. Markkinointi- ja myyntiprosessien hallintaan yrityksen tulisi kehittää oma seurantaohjelma, jonka avulla valitun markkinointistrategian onnistuneisuutta voitaisiin helposti seurata, sekä laatukäsikirja, mistä löytyisivät standardoidut toimenpidemallit asiakashankintaan, kenttämyyntiin ja myyntiprojektien läpiviemiseen sekä eri toimihenkilöiden toimenkuvaukset ja vastuualueet.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.
Resumo:
In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.
Resumo:
Monimutkaisissa ja muuttuvissa ympäristöissä työskentelevät robotit tarvitsevat kykyä manipuloida ja tarttua esineisiin. Tämä työ tutkii robottitarttumisen ja robottitartuntapis-teiden koneoppimisen aiempaa tutkimusta ja nykytilaa. Nykyaikaiset menetelmät käydään läpi, ja Le:n koneoppimiseen pohjautuva luokitin toteutetaan, koska se tarjoaa parhaan onnistumisprosentin tutkituista menetelmistä ja on muokattavissa sopivaksi käytettävissä olevalle robotille. Toteutettu menetelmä käyttää intensititeettikuvaan ja syvyyskuvaan po-hjautuvia ominaisuuksi luokitellakseen potentiaaliset tartuntapisteet. Tämän toteutuksen tulokset esitellään.
Resumo:
A new area of machine learning research called deep learning, has moved machine learning closer to one of its original goals: artificial intelligence and general learning algorithm. The key idea is to pretrain models in completely unsupervised way and finally they can be fine-tuned for the task at hand using supervised learning. In this thesis, a general introduction to deep learning models and algorithms are given and these methods are applied to facial keypoints detection. The task is to predict the positions of 15 keypoints on grayscale face images. Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. In experiments, we pretrained deep belief networks (DBN) and finally performed a discriminative fine-tuning. We varied the depth and size of an architecture. We tested both deterministic and sampled hidden activations and the effect of additional unlabeled data on pretraining. The experimental results show that our model provides better results than publicly available benchmarks for the dataset.
Resumo:
The growing population in cities increases the energy demand and affects the environment by increasing carbon emissions. Information and communications technology solutions which enable energy optimization are needed to address this growing energy demand in cities and to reduce carbon emissions. District heating systems optimize the energy production by reusing waste energy with combined heat and power plants. Forecasting the heat load demand in residential buildings assists in optimizing energy production and consumption in a district heating system. However, the presence of a large number of factors such as weather forecast, district heating operational parameters and user behavioural parameters, make heat load forecasting a challenging task. This thesis proposes a probabilistic machine learning model using a Naive Bayes classifier, to forecast the hourly heat load demand for three residential buildings in the city of Skellefteå, Sweden over a period of winter and spring seasons. The district heating data collected from the sensors equipped at the residential buildings in Skellefteå, is utilized to build the Bayesian network to forecast the heat load demand for horizons of 1, 2, 3, 6 and 24 hours. The proposed model is validated by using four cases to study the influence of various parameters on the heat load forecast by carrying out trace driven analysis in Weka and GeNIe. Results show that current heat load consumption and outdoor temperature forecast are the two parameters with most influence on the heat load forecast. The proposed model achieves average accuracies of 81.23 % and 76.74 % for a forecast horizon of 1 hour in the three buildings for winter and spring seasons respectively. The model also achieves an average accuracy of 77.97 % for three buildings across both seasons for the forecast horizon of 1 hour by utilizing only 10 % of the training data. The results indicate that even a simple model like Naive Bayes classifier can forecast the heat load demand by utilizing less training data.
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.