973 resultados para Gradient Descent method


Relevância:

90.00% 90.00%

Publicador:

Resumo:

A method for calculating the globally optimal learning rate in on-line gradient-descent training of multilayer neural networks is presented. The method is based on a variational approach which maximizes the decrease in generalization error over a given time frame. We demonstrate the method by computing optimal learning rates in typical learning scenarios. The method can also be employed when different learning rates are allowed for different parameter vectors as well as to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical mechanics framework which is appropriate for large input dimension. We find significant improvement over standard gradient descent in both the transient and asymptotic phases of learning.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The general iteration method for nonexpansive mappings on a Banach space is considered. Under some assumption of fast enough convergence on the sequence of (“almost” nonexpansive) perturbed iteration mappings, if the basic method is τ−convergent for a suitable topology τ weaker than the norm topology, then the perturbed method is also τ−convergent. Application is presented to the gradient-prox method for monotone inclusions in Hilbert spaces.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.

Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Les réseaux de capteurs sont formés d’un ensemble de dispositifs capables de prendre individuellement des mesures d’un environnement particulier et d’échanger de l’information afin d’obtenir une représentation de haut niveau sur les activités en cours dans la zone d’intérêt. Une telle détection distribuée, avec de nombreux appareils situés à proximité des phénomènes d’intérêt, est pertinente dans des domaines tels que la surveillance, l’agriculture, l’observation environnementale, la surveillance industrielle, etc. Nous proposons dans cette thèse plusieurs approches pour effectuer l’optimisation des opérations spatio-temporelles de ces dispositifs, en déterminant où les placer dans l’environnement et comment les contrôler au fil du temps afin de détecter les cibles mobiles d’intérêt. La première nouveauté consiste en un modèle de détection réaliste représentant la couverture d’un réseau de capteurs dans son environnement. Nous proposons pour cela un modèle 3D probabiliste de la capacité de détection d’un capteur sur ses abords. Ce modèle inègre également de l’information sur l’environnement grâce à l’évaluation de la visibilité selon le champ de vision. À partir de ce modèle de détection, l’optimisation spatiale est effectuée par la recherche du meilleur emplacement et l’orientation de chaque capteur du réseau. Pour ce faire, nous proposons un nouvel algorithme basé sur la descente du gradient qui a été favorablement comparée avec d’autres méthodes génériques d’optimisation «boites noires» sous l’aspect de la couverture du terrain, tout en étant plus efficace en terme de calculs. Une fois que les capteurs placés dans l’environnement, l’optimisation temporelle consiste à bien couvrir un groupe de cibles mobiles dans l’environnement. D’abord, on effectue la prédiction de la position future des cibles mobiles détectées par les capteurs. La prédiction se fait soit à l’aide de l’historique des autres cibles qui ont traversé le même environnement (prédiction à long terme), ou seulement en utilisant les déplacements précédents de la même cible (prédiction à court terme). Nous proposons de nouveaux algorithmes dans chaque catégorie qui performent mieux ou produits des résultats comparables par rapport aux méthodes existantes. Une fois que les futurs emplacements de cibles sont prédits, les paramètres des capteurs sont optimisés afin que les cibles soient correctement couvertes pendant un certain temps, selon les prédictions. À cet effet, nous proposons une méthode heuristique pour faire un contrôle de capteurs, qui se base sur les prévisions probabilistes de trajectoire des cibles et également sur la couverture probabiliste des capteurs des cibles. Et pour terminer, les méthodes d’optimisation spatiales et temporelles proposées ont été intégrées et appliquées avec succès, ce qui démontre une approche complète et efficace pour l’optimisation spatio-temporelle des réseaux de capteurs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present document deals with the optimization of shape of aerodynamic profiles -- The objective is to reduce the drag coefficient on a given profile without penalising the lift coefficient -- A set of control points defining the geometry are passed and parameterized as a B-Spline curve -- These points are modified automatically by means of CFD analysis -- A given shape is defined by an user and a valid volumetric CFD domain is constructed from this planar data and a set of user-defined parameters -- The construction process involves the usage of 2D and 3D meshing algorithms that were coupled into own- code -- The volume of air surrounding the airfoil and mesh quality are also parametrically defined -- Some standard NACA profiles were used by obtaining first its control points in order to test the algorithm -- Navier-Stokes equations were solved for turbulent, steady-state ow of compressible uids using the k-epsilon model and SIMPLE algorithm -- In order to obtain data for the optimization process an utility to extract drag and lift data from the CFD simulation was added -- After a simulation is run drag and lift data are passed to the optimization process -- A gradient-based method using the steepest descent was implemented in order to define the magnitude and direction of the displacement of each control point -- The control points and other parameters defined as the design variables are iteratively modified in order to achieve an optimum -- Preliminary results on conceptual examples show a decrease in drag and a change in geometry that obeys to aerodynamic behavior principles

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new simple method to design linear-phase finite impulse response (FIR) digital filters, based on the steepest-descent optimization method, is presented in this paper. Starting from the specifications of the desired frequency response and a maximum approximation error a nearly optimum digital filter is obtained. Tests have shown that this method is alternative to other traditional ones such as Frequency Sampling and Parks-McClellan, mainly when other than brick wall frequency response is required as a desired frequency response. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this thesis, a feed-forward, back-propagating Artificial Neural Network using the gradient descent algorithm is developed to forecast the directional movement of daily returns for WTI, gold and copper futures. Out-of-sample back-test results vary, with some predictive abilities for copper futures but none for either WTI or gold. The best statistically significant hit rate achieved was 57% for copper with an absolute return Sharpe Ratio of 1.25 and a benchmarked Information Ratio of 2.11.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

I use a multi-layer feedforward perceptron, with backpropagation learning implemented via stochastic gradient descent, to extrapolate the volatility smile of Euribor derivatives over low-strikes by training the network on parametric prices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An e cient procedure for the blind inversion of a nonlinear Wiener system is proposed. We proved that the problem can be expressed as a problem of blind source separation in nonlinear mixtures, for which a solution has been recently proposed. Based on a quasi-nonparametric relative gradient descent, the proposed algorithm can perform e ciently even in the presence of hard distortions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Invasive mold infections are life-threatening diseases for which appropriate antifungal therapy is crucial. Their epidemiology is evolving, with the emergence of triazole-resistant Aspergillus spp. and multidrug-resistant non-Aspergillus molds. Despite the lack of interpretive criteria, antifungal susceptibility testing of molds may be useful in guiding antifungal therapy. The standard broth microdilution method (BMD) is demanding and requires expertise. We assessed the performance of a commercialized gradient diffusion method (Etest method) as an alternative to BMD. The MICs or minimal effective concentrations (MECs) of amphotericin B, voriconazole, posaconazole, caspofungin, and micafungin were assessed for 290 clinical isolates of the most representative pathogenic molds (154 Aspergillus and 136 non-Aspergillus isolates) with the BMD and Etest methods. Essential agreements (EAs) within ±2 dilutions of ≥90% between the two methods were considered acceptable. EAs for amphotericin B and voriconazole were >90% for most potentially susceptible species. For posaconazole, the correlation was acceptable for Mucoromycotina but Etest MIC values were consistently lower for Aspergillus spp. (EAs of <90%). Excellent EAs were found for echinocandins with highly susceptible (MECs of <0.015 μg/ml) or intrinsically resistant (MECs of >16 μg/ml) strains. However, MEC determinations lacked consistency between methods for strains exhibiting mid-range MECs for echinocandins. We concluded that the Etest method is an appropriate alternative to BMD for antifungal susceptibility testing of molds under specific circumstances, including testing with amphotericin B or triazoles for non-Aspergillus molds (Mucoromycotina and Fusarium spp.). Additional study of molecularly characterized triazole-resistant Aspergillus isolates is required to confirm the ability of the Etest method to detect voriconazole and posaconazole resistance among Aspergillus spp.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement.