838 resultados para Modeling Rapport Using Machine Learning
Resumo:
Transcription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Resumo:
Empirical studies of education programs and systems, by nature, rely upon use of student outcomes that are measurable. Often, these come in the form of test scores. However, in light of growing evidence about the long-run importance of other student skills and behaviors, the time has come for a broader approach to evaluating education. This dissertation undertakes experimental, quasi-experimental, and descriptive analyses to examine social, behavioral, and health-related mechanisms of the educational process. My overarching research question is simply, which inside- and outside-the-classroom features of schools and educational interventions are most beneficial to students in the long term? Furthermore, how can we apply this evidence toward informing policy that could effectively reduce stark social, educational, and economic inequalities?
The first study of three assesses mechanisms by which the Fast Track project, a randomized intervention in the early 1990s for high-risk children in four communities (Durham, NC; Nashville, TN; rural PA; and Seattle, WA), reduced delinquency, arrests, and health and mental health service utilization in adolescence through young adulthood (ages 12-20). A decomposition of treatment effects indicates that about a third of Fast Track’s impact on later crime outcomes can be accounted for by improvements in social and self-regulation skills during childhood (ages 6-11), such as prosocial behavior, emotion regulation and problem solving. These skills proved less valuable for the prevention of mental and physical health problems.
The second study contributes new evidence on how non-instructional investments – such as increased spending on school social workers, guidance counselors, and health services – affect multiple aspects of student performance and well-being. Merging several administrative data sources spanning the 1996-2013 school years in North Carolina, I use an instrumental variables approach to estimate the extent to which local expenditure shifts affect students’ academic and behavioral outcomes. My findings indicate that exogenous increases in spending on non-instructional services not only reduce student absenteeism and disciplinary problems (important predictors of long-term outcomes) but also significantly raise student achievement, in similar magnitude to corresponding increases in instructional spending. Furthermore, subgroup analyses suggest that investments in student support personnel such as social workers, health services, and guidance counselors, in schools with concentrated low-income student populations could go a long way toward closing socioeconomic achievement gaps.
The third study examines individual pathways that lead to high school graduation or dropout. It employs a variety of machine learning techniques, including decision trees, random forests with bagging and boosting, and support vector machines, to predict student dropout using longitudinal administrative data from North Carolina. I consider a large set of predictor measures from grades three through eight including academic achievement, behavioral indicators, and background characteristics. My findings indicate that the most important predictors include eighth grade absences, math scores, and age-for-grade as well as early reading scores. Support vector classification (with a high cost parameter and low gamma parameter) predicts high school dropout with the highest overall validity in the testing dataset at 90.1 percent followed by decision trees with boosting and interaction terms at 89.5 percent.
Resumo:
Brain injury due to lack of oxygen or impaired blood flow around the time of birth, may cause long term neurological dysfunction or death in severe cases. The treatments need to be initiated as soon as possible and tailored according to the nature of the injury to achieve best outcomes. The Electroencephalogram (EEG) currently provides the best insight into neurological activities. However, its interpretation presents formidable challenge for the neurophsiologists. Moreover, such expertise is not widely available particularly around the clock in a typical busy Neonatal Intensive Care Unit (NICU). Therefore, an automated computerized system for detecting and grading the severity of brain injuries could be of great help for medical staff to diagnose and then initiate on-time treatments. In this study, automated systems for detection of neonatal seizures and grading the severity of Hypoxic-Ischemic Encephalopathy (HIE) using EEG and Heart Rate (HR) signals are presented. It is well known that there is a lot of contextual and temporal information present in the EEG and HR signals if examined at longer time scale. The systems developed in the past, exploited this information either at very early stage of the system without any intelligent block or at very later stage where presence of such information is much reduced. This work has particularly focused on the development of a system that can incorporate the contextual information at the middle (classifier) level. This is achieved by using dynamic classifiers that are able to process the sequences of feature vectors rather than only one feature vector at a time.
Resumo:
Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.
Resumo:
In any environment, group dynamics would exist. How we deal with it in a competitive work environment defines who we are using transformative learning. This paper provides useful information from a number of theorists who share perspectives on the complex nature of groups.
Resumo:
The application of custom classification techniques and posterior probability modeling (PPM) using Worldview-2 multispectral imagery to archaeological field survey is presented in this paper. Research is focused on the identification of Neolithic felsite stone tool workshops in the North Mavine region of the Shetland Islands in Northern Scotland. Sample data from known workshops surveyed using differential GPS are used alongside known non-sites to train a linear discriminant analysis (LDA) classifier based on a combination of datasets including Worldview-2 bands, band difference ratios (BDR) and topographical derivatives. Principal components analysis is further used to test and reduce dimensionality caused by redundant datasets. Probability models were generated by LDA using principal components and tested with sites identified through geological field survey. Testing shows the prospective ability of this technique and significance between 0.05 and 0.01, and gain statistics between 0.90 and 0.94, higher than those obtained using maximum likelihood and random forest classifiers. Results suggest that this approach is best suited to relatively homogenous site types, and performs better with correlated data sources. Finally, by combining posterior probability models and least-cost analysis, a survey least-cost efficacy model is generated showing the utility of such approaches to archaeological field survey.
Resumo:
Algorithms for concept drift handling are important for various applications including video analysis and smart grids. In this paper we present decision tree ensemble classication method based on the Random Forest algorithm for concept drift. The weighted majority voting ensemble aggregation rule is employed based on the ideas of Accuracy Weighted Ensemble (AWE) method. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits both temporal weighting of samples and ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method with îriginal random forest with incorporated replace-the-looser forgetting andother state-of-the-art concept-drift classiers like AWE2.
Resumo:
Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.
Resumo:
There has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and time complexity). Once one has developed an approach to a problem of interest, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Standard tests used for this purpose are able to consider jointly neither performance measures nor multiple competitors at once. The aim of this paper is to resolve these issues by developing statistical procedures that are able to account for multiple competing measures at the same time and to compare multiple algorithms altogether. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameters of such models, as usually the number of studied cases is very reduced in such comparisons. Data from a comparison among general purpose classifiers is used to show a practical application of our tests.
Resumo:
In diesem Beitrag wird eine neue Methode zur Analyse des manuellen Kommissionierprozesses vorgestellt, mit der u. a. die Kommissionierzeitanteile automatisch erfasst werden können. Diese Methode basiert auf einer sensorgestützten Bewegungsklassifikation, wie sie bspw. im Sport oder in der Medizin Anwendung findet. Dabei werden mobile Sensoren genutzt, die fortlaufend Messwerte wie z. B. die Beschleunigung oder die Drehgeschwindigkeit des Kommissionierers aufzeichnen. Auf Basis dieser Daten können Informationen über die ausgeführten Bewegungen und insbesondere über die durchlaufenen Bewegungszustände gewonnen werden. Dieser Ansatz wird im vorliegenden Beitrag auf die Kommissionierung übertragen. Dazu werden zunächst Klassen relevanter Bewegungen identifiziert und anschließend mit Verfahren aus dem maschinellen Lernen verarbeitet. Die Klassifikation erfolgt nach dem Prinzip des überwachten Lernens. Dabei werden durchschnittliche Erkennungsraten von bis zu 78,94 Prozent erzielt.
Resumo:
Abstract Heading into the 2020s, Physics and Astronomy are undergoing experimental revolutions that will reshape our picture of the fabric of the Universe. The Large Hadron Collider (LHC), the largest particle physics project in the world, produces 30 petabytes of data annually that need to be sifted through, analysed, and modelled. In astrophysics, the Large Synoptic Survey Telescope (LSST) will be taking a high-resolution image of the full sky every 3 days, leading to data rates of 30 terabytes per night over ten years. These experiments endeavour to answer the question why 96% of the content of the universe currently elude our physical understanding. Both the LHC and LSST share the 5-dimensional nature of their data, with position, energy and time being the fundamental axes. This talk will present an overview of the experiments and data that is gathered, and outlines the challenges in extracting information. Common strategies employed are very similar to industrial data! Science problems (e.g., data filtering, machine learning, statistical interpretation) and provide a seed for exchange of knowledge between academia and industry. Speaker Biography Professor Mark Sullivan Mark Sullivan is a Professor of Astrophysics in the Department of Physics and Astronomy. Mark completed his PhD at Cambridge, and following postdoctoral study in Durham, Toronto and Oxford, now leads a research group at Southampton studying dark energy using exploding stars called "type Ia supernovae". Mark has many years' experience of research that involves repeatedly imaging the night sky to track the arrival of transient objects, involving significant challenges in data handling, processing, classification and analysis.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
In computer vision, training a model that performs classification effectively is highly dependent on the extracted features, and the number of training instances. Conventionally, feature detection and extraction are performed by a domain-expert who, in many cases, is expensive to employ and hard to find. Therefore, image descriptors have emerged to automate these tasks. However, designing an image descriptor still requires domain-expert intervention. Moreover, the majority of machine learning algorithms require a large number of training examples to perform well. However, labelled data is not always available or easy to acquire, and dealing with a large dataset can dramatically slow down the training process. In this paper, we propose a novel Genetic Programming based method that automatically synthesises a descriptor using only two training instances per class. The proposed method combines arithmetic operators to evolve a model that takes an image and generates a feature vector. The performance of the proposed method is assessed using six datasets for texture classification with different degrees of rotation, and is compared with seven domain-expert designed descriptors. The results show that the proposed method is robust to rotation, and has significantly outperformed, or achieved a comparable performance to, the baseline methods.
Resumo:
Research in ubiquitous and pervasive technologies have made it possible to recognise activities of daily living through non-intrusive sensors. The data captured from these sensors are required to be classified using various machine learning or knowledge driven techniques to infer and recognise activities. The process of discovering the activities and activity-object patterns from the sensors tagged to objects as they are used is critical to recognising the activities. In this paper, we propose a topic model process of discovering activities and activity-object patterns from the interactions of low level state-change sensors. We also develop a recognition and segmentation algorithm to recognise activities and recognise activity boundaries. Experimental results we present validates our framework and shows it is comparable to existing approaches.
Resumo:
Current Ambient Intelligence and Intelligent Environment research focuses on the interpretation of a subject’s behaviour at the activity level by logging the Activity of Daily Living (ADL) such as eating, cooking, etc. In general, the sensors employed (e.g. PIR sensors, contact sensors) provide low resolution information. Meanwhile, the expansion of ubiquitous computing allows researchers to gather additional information from different types of sensor which is possible to improve activity analysis. Based on the previous research about sitting posture detection, this research attempts to further analyses human sitting activity. The aim of this research is to use non-intrusive low cost pressure sensor embedded chair system to recognize a subject’s activity by using their detected postures. There are three steps for this research, the first step is to find a hardware solution for low cost sitting posture detection, second step is to find a suitable strategy of sitting posture detection and the last step is to correlate the time-ordered sitting posture sequences with sitting activity. The author initiated a prototype type of sensing system called IntelliChair for sitting posture detection. Two experiments are proceeded in order to determine the hardware architecture of IntelliChair system. The prototype looks at the sensor selection and integration of various sensor and indicates the best for a low cost, non-intrusive system. Subsequently, this research implements signal process theory to explore the frequency feature of sitting posture, for the purpose of determining a suitable sampling rate for IntelliChair system. For second and third step, ten subjects are recruited for the sitting posture data and sitting activity data collection. The former dataset is collected byasking subjects to perform certain pre-defined sitting postures on IntelliChair and it is used for posture recognition experiment. The latter dataset is collected by asking the subjects to perform their normal sitting activity routine on IntelliChair for four hours, and the dataset is used for activity modelling and recognition experiment. For the posture recognition experiment, two Support Vector Machine (SVM) based classifiers are trained (one for spine postures and the other one for leg postures), and their performance evaluated. Hidden Markov Model is utilized for sitting activity modelling and recognition in order to establish the selected sitting activities from sitting posture sequences.2. After experimenting with possible sensors, Force Sensing Resistor (FSR) is selected as the pressure sensing unit for IntelliChair. Eight FSRs are mounted on the seat and back of a chair to gather haptic (i.e., touch-based) posture information. Furthermore, the research explores the possibility of using alternative non-intrusive sensing technology (i.e. vision based Kinect Sensor from Microsoft) and find out the Kinect sensor is not reliable for sitting posture detection due to the joint drifting problem. A suitable sampling rate for IntelliChair is determined according to the experiment result which is 6 Hz. The posture classification performance shows that the SVM based classifier is robust to “familiar” subject data (accuracy is 99.8% with spine postures and 99.9% with leg postures). When dealing with “unfamiliar” subject data, the accuracy is 80.7% for spine posture classification and 42.3% for leg posture classification. The result of activity recognition achieves 41.27% accuracy among four selected activities (i.e. relax, play game, working with PC and watching video). The result of this thesis shows that different individual body characteristics and sitting habits influence both sitting posture and sitting activity recognition. In this case, it suggests that IntelliChair is suitable for individual usage but a training stage is required.