885 resultados para Q-learning algorithm


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Purpose: To investigate the effect of incorporating a beam spreading parameter in a beam angle optimization algorithm and to evaluate its efficacy for creating coplanar IMRT lung plans in conjunction with machine learning generated dose objectives.

Methods: Fifteen anonymized patient cases were each re-planned with ten values over the range of the beam spreading parameter, k, and analyzed with a Wilcoxon signed-rank test to determine whether any particular value resulted in significant improvement over the initially treated plan created by a trained dosimetrist. Dose constraints were generated by a machine learning algorithm and kept constant for each case across all k values. Parameters investigated for potential improvement included mean lung dose, V20 lung, V40 heart, 80% conformity index, and 90% conformity index.

Results: With a confidence level of 5%, treatment plans created with this method resulted in significantly better conformity indices. Dose coverage to the PTV was improved by an average of 12% over the initial plans. At the same time, these treatment plans showed no significant difference in mean lung dose, V20 lung, or V40 heart when compared to the initial plans; however, it should be noted that these results could be influenced by the small sample size of patient cases.

Conclusions: The beam angle optimization algorithm, with the inclusion of the beam spreading parameter k, increases the dose conformity of the automatically generated treatment plans over that of the initial plans without adversely affecting the dose to organs at risk. This parameter can be varied according to physician preference in order to control the tradeoff between dose conformity and OAR sparing without compromising the integrity of the plan.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis addresses the Batch Reinforcement Learning methods in Robotics. This sub-class of Reinforcement Learning has shown promising results and has been the focus of recent research. Three contributions are proposed that aim to extend the state-of-art methods allowing for a faster and more stable learning process, such as required for learning in Robotics. The Q-learning update-rule is widely applied, since it allows to learn without the presence of a model of the environment. However, this update-rule is transition-based and does not take advantage of the underlying episodic structure of collected batch of interactions. The Q-Batch update-rule is proposed in this thesis, to process experiencies along the trajectories collected in the interaction phase. This allows a faster propagation of obtained rewards and penalties, resulting in faster and more robust learning. Non-parametric function approximations are explored, such as Gaussian Processes. This type of approximators allows to encode prior knowledge about the latent function, in the form of kernels, providing a higher level of exibility and accuracy. The application of Gaussian Processes in Batch Reinforcement Learning presented a higher performance in learning tasks than other function approximations used in the literature. Lastly, in order to extract more information from the experiences collected by the agent, model-learning techniques are incorporated to learn the system dynamics. In this way, it is possible to augment the set of collected experiences with experiences generated through planning using the learned models. Experiments were carried out mainly in simulation, with some tests carried out in a physical robotic platform. The obtained results show that the proposed approaches are able to outperform the classical Fitted Q Iteration.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Developers strive to create innovative Artificial Intelligence (AI) behaviour in their games as a key selling point. Machine Learning is an area of AI that looks at how applications and agents can be programmed to learn their own behaviour without the need to manually design and implement each aspect of it. Machine learning methods have been utilised infrequently within games and are usually trained to learn offline before the game is released to the players. In order to investigate new ways AI could be applied innovatively to games it is wise to explore how machine learning methods could be utilised in real-time as the game is played, so as to allow AI agents to learn directly from the player or their environment. Two machine learning methods were implemented into a simple 2D Fighter test game to allow the agents to fully showcase their learned behaviour as the game is played. The methods chosen were: Q-Learning and an NGram based system. It was found that N-Grams and QLearning could significantly benefit game developers as they facilitate fast, realistic learning at run-time.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nowadays robotic applications are widespread and most of the manipulation tasks are efficiently solved. However, Deformable-Objects (DOs) still represent a huge limitation for robots. The main difficulty in DOs manipulation is dealing with the shape and dynamics uncertainties, which prevents the use of model-based approaches (since they are excessively computationally complex) and makes sensory data difficult to interpret. This thesis reports the research activities aimed to address some applications in robotic manipulation and sensing of Deformable-Linear-Objects (DLOs), with particular focus to electric wires. In all the works, a significant effort was made in the study of an effective strategy for analyzing sensory signals with various machine learning algorithms. In the former part of the document, the main focus concerns the wire terminals, i.e. detection, grasping, and insertion. First, a pipeline that integrates vision and tactile sensing is developed, then further improvements are proposed for each module. A novel procedure is proposed to gather and label massive amounts of training images for object detection with minimal human intervention. Together with this strategy, we extend a generic object detector based on Convolutional-Neural-Networks for orientation prediction. The insertion task is also extended by developing a closed-loop control capable to guide the insertion of a longer and curved segment of wire through a hole, where the contact forces are estimated by means of a Recurrent-Neural-Network. In the latter part of the thesis, the interest shifts to the DLO shape. Robotic reshaping of a DLO is addressed by means of a sequence of pick-and-place primitives, while a decision making process driven by visual data learns the optimal grasping locations exploiting Deep Q-learning and finds the best releasing point. The success of the solution leverages on a reliable interpretation of the DLO shape. For this reason, further developments are made on the visual segmentation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Reinforcement learning is a particular paradigm of machine learning that, recently, has proved times and times again to be a very effective and powerful approach. On the other hand, cryptography usually takes the opposite direction. While machine learning aims at analyzing data, cryptography aims at maintaining its privacy by hiding such data. However, the two techniques can be jointly used to create privacy preserving models, able to make inferences on the data without leaking sensitive information. Despite the numerous amount of studies performed on machine learning and cryptography, reinforcement learning in particular has never been applied to such cases before. Being able to successfully make use of reinforcement learning in an encrypted scenario would allow us to create an agent that efficiently controls a system without providing it with full knowledge of the environment it is operating in, leading the way to many possible use cases. Therefore, we have decided to apply the reinforcement learning paradigm to encrypted data. In this project we have applied one of the most well-known reinforcement learning algorithms, called Deep Q-Learning, to simple simulated environments and studied how the encryption affects the training performance of the agent, in order to see if it is still able to learn how to behave even when the input data is no longer readable by humans. The results of this work highlight that the agent is still able to learn with no issues whatsoever in small state spaces with non-secure encryptions, like AES in ECB mode. For fixed environments, it is also able to reach a suboptimal solution even in the presence of secure modes, like AES in CBC mode, showing a significant improvement with respect to a random agent; however, its ability to generalize in stochastic environments or big state spaces suffers greatly.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Reinforcement Learning is an increasingly popular area of Artificial Intelligence. The applications of this learning paradigm are many, but its application in mobile computing is in its infancy. This study aims to provide an overview of current Reinforcement Learning applications on mobile devices, as well as to introduce a new framework for iOS devices: Swift-RL Lib. This new Swift package allows developers to easily support and integrate two of the most common RL algorithms, Q-Learning and Deep Q-Network, in a fully customizable environment. All processes are performed on the device, without any need for remote computation. The framework was tested in different settings and evaluated through several use cases. Through an in-depth performance analysis, we show that the platform provides effective and efficient support for Reinforcement Learning for mobile applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nella letteratura economica e di teoria dei giochi vi è un dibattito aperto sulla possibilità di emergenza di comportamenti anticompetitivi da parte di algoritmi di determinazione automatica dei prezzi di mercato. L'obiettivo di questa tesi è sviluppare un modello di reinforcement learning di tipo actor-critic con entropy regularization per impostare i prezzi in un gioco dinamico di competizione oligopolistica con prezzi continui. Il modello che propongo esibisce in modo coerente comportamenti cooperativi supportati da meccanismi di punizione che scoraggiano la deviazione dall'equilibrio raggiunto a convergenza. Il comportamento di questo modello durante l'apprendimento e a convergenza avvenuta aiuta inoltre a interpretare le azioni compiute da Q-learning tabellare e altri algoritmi di prezzo in condizioni simili. I risultati sono robusti alla variazione del numero di agenti in competizione e al tipo di deviazione dall'equilibrio ottenuto a convergenza, punendo anche deviazioni a prezzi più alti.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Internet of Things (IoT) is a concept that can foster the emergence of innovative applications. In order to minimize parents’s concerns about their children’s safety, this paper presents the design of a smart Internet of Things system for identifying dangerous situations. The system will be based on real time collection and analysis of physiological signals monitored by non-invasive and non-intrusive sensors, Frequency IDentification (RFID) tags and a Global Positioning System (GPS) to determine when a child is in danger. The assumption of a state of danger is made taking into account the validation of a certain number of biometric reactions to some specific situations and according to a self-learning algorithm developed for this architecture. The results of the analysis of data collected and the location of the child will be able in real time to child’s care holders in a web application.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Given a set of images of scenes containing different object categories (e.g. grass, roads) our objective is to discover these objects in each image, and to use this object occurrences to perform a scene classification (e.g. beach scene, mountain scene). We achieve this by using a supervised learning algorithm able to learn with few images to facilitate the user task. We use a probabilistic model to recognise the objects and further we classify the scene based on their object occurrences. Experimental results are shown and evaluated to prove the validity of our proposal. Object recognition performance is compared to the approaches of He et al. (2004) and Marti et al. (2001) using their own datasets. Furthermore an unsupervised method is implemented in order to evaluate the advantages and disadvantages of our supervised classification approach versus an unsupervised one

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider an agent who has to repeatedly make choices in an uncertainand changing environment, who has full information of the past, who discountsfuture payoffs, but who has no prior. We provide a learning algorithm thatperforms almost as well as the best of a given finite number of experts orbenchmark strategies and does so at any point in time, provided the agentis sufficiently patient. The key is to find the appropriate degree of forgettingdistant past. Standard learning algorithms that treat recent and distant pastequally do not have the sequential epsilon optimality property.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fluvial deposits are a challenge for modelling flow in sub-surface reservoirs. Connectivity and continuity of permeable bodies have a major impact on fluid flow in porous media. Contemporary object-based and multipoint statistics methods face a problem of robust representation of connected structures. An alternative approach to model petrophysical properties is based on machine learning algorithm ? Support Vector Regression (SVR). Semi-supervised SVR is able to establish spatial connectivity taking into account the prior knowledge on natural similarities. SVR as a learning algorithm is robust to noise and captures dependencies from all available data. Semi-supervised SVR applied to a synthetic fluvial reservoir demonstrated robust results, which are well matched to the flow performance