838 resultados para Modeling Rapport Using Machine Learning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A variety of physical and biomedical imaging techniques, such as digital holography, interferometric synthetic aperture radar (InSAR), or magnetic resonance imaging (MRI) enable measurement of the phase of a physical quantity additionally to its amplitude. However, the phase can commonly only be measured modulo 2π, as a so called wrapped phase map. Phase unwrapping is the process of obtaining the underlying physical phase map from the wrapped phase. Tile-based phase unwrapping algorithms operate by first tessellating the phase map, then unwrapping individual tiles, and finally merging them to a continuous phase map. They can be implemented computationally efficiently and are robust to noise. However, they are prone to failure in the presence of phase residues or erroneous unwraps of single tiles. We tried to overcome these shortcomings by creating novel tile unwrapping and merging algorithms as well as creating a framework that allows to combine them in modular fashion. To increase the robustness of the tile unwrapping step, we implemented a model-based algorithm that makes efficient use of linear algebra to unwrap individual tiles. Furthermore, we adapted an established pixel-based unwrapping algorithm to create a quality guided tile merger. These original algorithms as well as previously existing ones were implemented in a modular phase unwrapping C++ framework. By examining different combinations of unwrapping and merging algorithms we compared our method to existing approaches. We could show that the appropriate choice of unwrapping and merging algorithms can significantly improve the unwrapped result in the presence of phase residues and noise. Beyond that, our modular framework allows for efficient design and test of new tile-based phase unwrapping algorithms. The software developed in this study is freely available.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation investigates the connection between spectral analysis and frame theory. When considering the spectral properties of a frame, we present a few novel results relating to the spectral decomposition. We first show that scalable frames have the property that the inner product of the scaling coefficients and the eigenvectors must equal the inverse eigenvalues. From this, we prove a similar result when an approximate scaling is obtained. We then focus on the optimization problems inherent to the scalable frames by first showing that there is an equivalence between scaling a frame and optimization problems with a non-restrictive objective function. Various objective functions are considered, and an analysis of the solution type is presented. For linear objectives, we can encourage sparse scalings, and with barrier objective functions, we force dense solutions. We further consider frames in high dimensions, and derive various solution techniques. From here, we restrict ourselves to various frame classes, to add more specificity to the results. Using frames generated from distributions allows for the placement of probabilistic bounds on scalability. For discrete distributions (Bernoulli and Rademacher), we bound the probability of encountering an ONB, and for continuous symmetric distributions (Uniform and Gaussian), we show that symmetry is retained in the transformed domain. We also prove several hyperplane-separation results. With the theory developed, we discuss graph applications of the scalability framework. We make a connection with graph conditioning, and show the in-feasibility of the problem in the general case. After a modification, we show that any complete graph can be conditioned. We then present a modification of standard PCA (robust PCA) developed by Cand\`es, and give some background into Electron Energy-Loss Spectroscopy (EELS). We design a novel scheme for the processing of EELS through robust PCA and least-squares regression, and test this scheme on biological samples. Finally, we take the idea of robust PCA and apply the technique of kernel PCA to perform robust manifold learning. We derive the problem and present an algorithm for its solution. There is also discussion of the differences with RPCA that make theoretical guarantees difficult.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human relationships have long been studied by scientists from domains like sociology, psychology, literature, etc. for understanding people's desires, goals, actions and expected behaviors. In this dissertation we study inter-personal relationships as expressed in natural language text. Modeling inter-personal relationships from text finds application in general natural language understanding, as well as real-world domains such as social networks, discussion forums, intelligent virtual agents, etc. We propose that the study of relationships should incorporate not only linguistic cues in text, but also the contexts in which these cues appear. Our investigations, backed by empirical evaluation, support this thesis, and demonstrate that the task benefits from using structured models that incorporate both types of information. We present such structured models to address the task of modeling the nature of relationships between any two given characters from a narrative. To begin with, we assume that relationships are of two types: cooperative and non-cooperative. We first describe an approach to jointly infer relationships between all characters in the narrative, and demonstrate how the task of characterizing the relationship between two characters can benefit from including information about their relationships with other characters in the narrative. We next formulate the relationship-modeling problem as a sequence prediction task to acknowledge the evolving nature of human relationships, and demonstrate the need to model the history of a relationship in predicting its evolution. Thereafter, we present a data-driven method to automatically discover various types of relationships such as familial, romantic, hostile, etc. Like before, we address the task of modeling evolving relationships but don't restrict ourselves to two types of relationships. We also demonstrate the need to incorporate not only local historical but also global context while solving this problem. Lastly, we demonstrate a practical application of modeling inter-personal relationships in the domain of online educational discussion forums. Such forums offer opportunities for its users to interact and form deeper relationships. With this view, we address the task of identifying initiation of such deeper relationships between a student and the instructor. Specifically, we analyze contents of the forums to automatically suggest threads to the instructors that require their intervention. By highlighting scenarios that need direct instructor-student interactions, we alleviate the need for the instructor to manually peruse all threads of the forum and also assist students who have limited avenues for communicating with instructors. We do this by incorporating the discourse structure of the thread through latent variables that abstractly represent contents of individual posts and model the flow of information in the thread. Such latent structured models that incorporate the linguistic cues without losing their context can be helpful in other related natural language understanding tasks as well. We demonstrate this by using the model for a very different task: identifying if a stated desire has been fulfilled by the end of a story.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the first part of this thesis we search for beyond the Standard Model physics through the search for anomalous production of the Higgs boson using the razor kinematic variables. We search for anomalous Higgs boson production using proton-proton collisions at center of mass energy √s=8 TeV collected by the Compact Muon Solenoid experiment at the Large Hadron Collider corresponding to an integrated luminosity of 19.8 fb-1.

In the second part we present a novel method for using a quantum annealer to train a classifier to recognize events containing a Higgs boson decaying to two photons. We train that classifier using simulated proton-proton collisions at √s=8 TeV producing either a Standard Model Higgs boson decaying to two photons or a non-resonant Standard Model process that produces a two photon final state.

The production mechanisms of the Higgs boson are precisely predicted by the Standard Model based on its association with the mechanism of electroweak symmetry breaking. We measure the yield of Higgs bosons decaying to two photons in kinematic regions predicted to have very little contribution from a Standard Model Higgs boson and search for an excess of events, which would be evidence of either non-standard production or non-standard properties of the Higgs boson. We divide the events into disjoint categories based on kinematic properties and the presence of additional b-quarks produced in the collisions. In each of these disjoint categories, we use the razor kinematic variables to characterize events with topological configurations incompatible with typical configurations found from standard model production of the Higgs boson.

We observe an excess of events with di-photon invariant mass compatible with the Higgs boson mass and localized in a small region of the razor plane. We observe 5 events with a predicted background of 0.54 ± 0.28, which observation has a p-value of 10-3 and a local significance of 3.35σ. This background prediction comes from 0.48 predicted non-resonant background events and 0.07 predicted SM higgs boson events. We proceed to investigate the properties of this excess, finding that it provides a very compelling peak in the di-photon invariant mass distribution and is physically separated in the razor plane from predicted background. Using another method of measuring the background and significance of the excess, we find a 2.5σ deviation from the Standard Model hypothesis over a broader range of the razor plane.

In the second part of the thesis we transform the problem of training a classifier to distinguish events with a Higgs boson decaying to two photons from events with other sources of photon pairs into the Hamiltonian of a spin system, the ground state of which is the best classifier. We then use a quantum annealer to find the ground state of this Hamiltonian and train the classifier. We find that we are able to do this successfully in less than 400 annealing runs for a problem of median difficulty at the largest problem size considered. The networks trained in this manner exhibit good classification performance, competitive with the more complicated machine learning techniques, and are highly resistant to overtraining. We also find that the nature of the training gives access to additional solutions that can be used to improve the classification performance by up to 1.2% in some regions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nous avons développé un jeu sérieux afin d’enseigner aux utilisateurs à dessiner des diagrammes de Lewis. Nous l’avons augmenté d’un environnement pouvant enregistrer des signaux électroencéphalographiques, les expressions faciales, et la pupille d’un utilisateur. Le but de ce travail est de vérifier si l’environnement peut permettre au jeu de s’adapter en temps réel à l’utilisateur grâce à une détection automatique du besoin d’aide de l’utilisateur ainsi que si l’utilisateur est davantage satisfait de son expérience avec l’adaptation. Les résultats démontrent que le système d’adaptation peut détecter le besoin d’aide grâce à deux modèles d’apprentissage machine entraînés différemment, l’un généralisé et l’autre personalisé, avec des performances respectives de 53.4% et 67.5% par rapport à un niveau de chance de 33.3%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nous avons développé un jeu sérieux afin d’enseigner aux utilisateurs à dessiner des diagrammes de Lewis. Nous l’avons augmenté d’un environnement pouvant enregistrer des signaux électroencéphalographiques, les expressions faciales, et la pupille d’un utilisateur. Le but de ce travail est de vérifier si l’environnement peut permettre au jeu de s’adapter en temps réel à l’utilisateur grâce à une détection automatique du besoin d’aide de l’utilisateur ainsi que si l’utilisateur est davantage satisfait de son expérience avec l’adaptation. Les résultats démontrent que le système d’adaptation peut détecter le besoin d’aide grâce à deux modèles d’apprentissage machine entraînés différemment, l’un généralisé et l’autre personalisé, avec des performances respectives de 53.4% et 67.5% par rapport à un niveau de chance de 33.3%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modifications in vegetation cover can have an impact on the climate through changes in biogeochemical and biogeophysical processes. In this paper, the tree canopy cover percentage of a savannah-like ecosystem (montado/dehesa) was estimated at Landsat pixel level for 2011, and the role of different canopy cover percentages on land surface albedo (LSA) and land surface temperature (LST) were analysed. A modelling procedure using a SGB machine-learning algorithm and Landsat 5-TM spectral bands and derived vegetation indices as explanatory variables, showed that the estimation of montado canopy cover was obtained with good agreement (R2 = 78.4%). Overall, montado canopy cover estimations showed that low canopy cover class (MT_1) is the most representative with 50.63% of total montado area. MODIS LSA and LST products were used to investigate the magnitude of differences in mean annual LSA and LST values between contrasting montado canopy cover percentages. As a result, it was found a significant statistical relationship between montado canopy cover percentage and mean annual surface albedo (R2 = 0.866, p < 0.001) and surface temperature (R2 = 0.942, p < 0.001). The comparisons between the four contrasting montado canopy cover classes showed marked differences in LSA (χ2 = 192.17, df = 3, p < 0.001) and LST (χ2 = 318.18, df = 3, p < 0.001). The highest montado canopy cover percentage (MT_4) generally had lower albedo than lowest canopy cover class, presenting a difference of −11.2% in mean annual albedo values. It was also showed that MT_4 and MT_3 are the cooler canopy cover classes, and MT_2 and MT_1 the warmer, where MT_1 class had a difference of 3.42 °C compared with MT_4 class. Overall, this research highlighted the role that potential changes in montado canopy cover may play in local land surface albedo and temperature variations, as an increase in these two biogeophysical parameters may potentially bring about, in the long term, local/regional climatic changes moving towards greater aridity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic Emission (AE) monitoring can be used to detect the presence of damage as well as determine its location in Structural Health Monitoring (SHM) applications. Information on the time difference of the signal generated by the damage event arriving at different sensors is essential in performing localization. This makes the time of arrival (ToA) an important piece of information to retrieve from the AE signal. Generally, this is determined using statistical methods such as the Akaike Information Criterion (AIC) which is particularly prone to errors in the presence of noise. And given that the structures of interest are surrounded with harsh environments, a way to accurately estimate the arrival time in such noisy scenarios is of particular interest. In this work, two new methods are presented to estimate the arrival times of AE signals which are based on Machine Learning. Inspired by great results in the field, two models are presented which are Deep Learning models - a subset of machine learning. They are based on Convolutional Neural Network (CNN) and Capsule Neural Network (CapsNet). The primary advantage of such models is that they do not require the user to pre-define selected features but only require raw data to be given and the models establish non-linear relationships between the inputs and outputs. The performance of the models is evaluated using AE signals generated by a custom ray-tracing algorithm by propagating them on an aluminium plate and compared to AIC. It was found that the relative error in estimation on the test set was < 5% for the models compared to around 45% of AIC. The testing process was further continued by preparing an experimental setup and acquiring real AE signals to test on. Similar performances were observed where the two models not only outperform AIC by more than a magnitude in their average errors but also they were shown to be a lot more robust as compared to AIC which fails in the presence of noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hematological cancers are a heterogeneous family of diseases that can be divided into leukemias, lymphomas, and myelomas, often called “liquid tumors”. Since they cannot be surgically removable, chemotherapy represents the mainstay of their treatment. However, it still faces several challenges like drug resistance and low response rate, and the need for new anticancer agents is compelling. The drug discovery process is long-term, costly, and prone to high failure rates. With the rapid expansion of biological and chemical "big data", some computational techniques such as machine learning tools have been increasingly employed to speed up and economize the whole process. Machine learning algorithms can create complex models with the aim to determine the biological activity of compounds against several targets, based on their chemical properties. These models are defined as multi-target Quantitative Structure-Activity Relationship (mt-QSAR) and can be used to virtually screen small and large chemical libraries for the identification of new molecules with anticancer activity. The aim of my Ph.D. project was to employ machine learning techniques to build an mt-QSAR classification model for the prediction of cytotoxic drugs simultaneously active against 43 hematological cancer cell lines. For this purpose, first, I constructed a large and diversified dataset of molecules extracted from the ChEMBL database. Then, I compared the performance of different ML classification algorithms, until Random Forest was identified as the one returning the best predictions. Finally, I used different approaches to maximize the performance of the model, which achieved an accuracy of 88% by correctly classifying 93% of inactive molecules and 72% of active molecules in a validation set. This model was further applied to the virtual screening of a small dataset of molecules tested in our laboratory, where it showed 100% accuracy in correctly classifying all molecules. This result is confirmed by our previous in vitro experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to assess the benefits of using e-learning resources in a dental training course on Atraumatic Restorative Treatment (ART). This e-course was given in a DVD format, which presented the ART technique and philosophy. The participants were twenty-four dentists from the Brazilian public health system. Prior to receiving the DVD, the dentists answered a questionnaire regarding their personal data, previous knowledge about ART, and general interest in training courses. The dentists also participated in an assessment process consisting of a test applied before and after the course. A single researcher corrected the tests, and intraexaminer reproducibility was calculated (kappa=0.89). Paired t-tests were carried out to compare the means between the assessments, showing a significant improvement in the performance of the subjects on the test taken after the course (p<0.05). A linear regression model was used with the difference between the means as the outcome. A greater improvement on the test results was observed among female dentists (p=0.034), dentists working for a shorter period of time in the public health system (p=0.042), and dentists who used the ART technique only for urgent and/or temporary treatment (p=0.010). In conclusion, e-learning has the potential of improving the knowledge that dentists working in the public health system have about ART, especially those with less clinical experience and less knowledge about the subject.