973 resultados para Gradient Descent method


Relevância:

80.00% 80.00%

Publicador:

Resumo:

L'apprentissage profond est un domaine de recherche en forte croissance en apprentissage automatique qui est parvenu à des résultats impressionnants dans différentes tâches allant de la classification d'images à la parole, en passant par la modélisation du langage. Les réseaux de neurones récurrents, une sous-classe d'architecture profonde, s'avèrent particulièrement prometteurs. Les réseaux récurrents peuvent capter la structure temporelle dans les données. Ils ont potentiellement la capacité d'apprendre des corrélations entre des événements éloignés dans le temps et d'emmagasiner indéfiniment des informations dans leur mémoire interne. Dans ce travail, nous tentons d'abord de comprendre pourquoi la profondeur est utile. Similairement à d'autres travaux de la littérature, nos résultats démontrent que les modèles profonds peuvent être plus efficaces pour représenter certaines familles de fonctions comparativement aux modèles peu profonds. Contrairement à ces travaux, nous effectuons notre analyse théorique sur des réseaux profonds acycliques munis de fonctions d'activation linéaires par parties, puisque ce type de modèle est actuellement l'état de l'art dans différentes tâches de classification. La deuxième partie de cette thèse porte sur le processus d'apprentissage. Nous analysons quelques techniques d'optimisation proposées récemment, telles l'optimisation Hessian free, la descente de gradient naturel et la descente des sous-espaces de Krylov. Nous proposons le cadre théorique des méthodes à région de confiance généralisées et nous montrons que plusieurs de ces algorithmes développés récemment peuvent être vus dans cette perspective. Nous argumentons que certains membres de cette famille d'approches peuvent être mieux adaptés que d'autres à l'optimisation non convexe. La dernière partie de ce document se concentre sur les réseaux de neurones récurrents. Nous étudions d'abord le concept de mémoire et tentons de répondre aux questions suivantes: Les réseaux récurrents peuvent-ils démontrer une mémoire sans limite? Ce comportement peut-il être appris? Nous montrons que cela est possible si des indices sont fournis durant l'apprentissage. Ensuite, nous explorons deux problèmes spécifiques à l'entraînement des réseaux récurrents, à savoir la dissipation et l'explosion du gradient. Notre analyse se termine par une solution au problème d'explosion du gradient qui implique de borner la norme du gradient. Nous proposons également un terme de régularisation conçu spécifiquement pour réduire le problème de dissipation du gradient. Sur un ensemble de données synthétique, nous montrons empiriquement que ces mécanismes peuvent permettre aux réseaux récurrents d'apprendre de façon autonome à mémoriser des informations pour une période de temps indéfinie. Finalement, nous explorons la notion de profondeur dans les réseaux de neurones récurrents. Comparativement aux réseaux acycliques, la définition de profondeur dans les réseaux récurrents est souvent ambiguë. Nous proposons différentes façons d'ajouter de la profondeur dans les réseaux récurrents et nous évaluons empiriquement ces propositions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Optimal control theory is a powerful tool for solving control problems in quantum mechanics, ranging from the control of chemical reactions to the implementation of gates in a quantum computer. Gradient-based optimization methods are able to find high fidelity controls, but require considerable numerical effort and often yield highly complex solutions. We propose here to employ a two-stage optimization scheme to significantly speed up convergence and achieve simpler controls. The control is initially parametrized using only a few free parameters, such that optimization in this pruned search space can be performed with a simplex method. The result, considered now simply as an arbitrary function on a time grid, is the starting point for further optimization with a gradient-based method that can quickly converge to high fidelities. We illustrate the success of this hybrid technique by optimizing a geometric phase gate for two superconducting transmon qubits coupled with a shared transmission line resonator, showing that a combination of Nelder-Mead simplex and Krotov’s method yields considerably better results than either one of the two methods alone.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The registration of pre-operative volumetric datasets to intra- operative two-dimensional images provides an improved way of verifying patient position and medical instrument loca- tion. In applications from orthopedics to neurosurgery, it has a great value in maintaining up-to-date information about changes due to intervention. We propose a mutual information- based registration algorithm to establish the proper align- ment. For optimization purposes, we compare the perfor- mance of the non-gradient Powell method and two slightly di erent versions of a stochastic gradient ascent strategy: one using a sparsely sampled histogramming approach and the other Parzen windowing to carry out probability density approximation. Our main contribution lies in adopting the stochastic ap- proximation scheme successfully applied in 3D-3D registra- tion problems to the 2D-3D scenario, which obviates the need for the generation of full DRRs at each iteration of pose op- timization. This facilitates a considerable savings in compu- tation expense. We also introduce a new probability density estimator for image intensities via sparse histogramming, de- rive gradient estimates for the density measures required by the maximization procedure and introduce the framework for a multiresolution strategy to the problem. Registration results are presented on uoroscopy and CT datasets of a plastic pelvis and a real skull, and on a high-resolution CT- derived simulated dataset of a real skull, a plastic skull, a plastic pelvis and a plastic lumbar spine segment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An efficient model identification algorithm for a large class of linear-in-the-parameters models is introduced that simultaneously optimises the model approximation ability, sparsity and robustness. The derived model parameters in each forward regression step are initially estimated via the orthogonal least squares (OLS), followed by being tuned with a new gradient-descent learning algorithm based on the basis pursuit that minimises the l(1) norm of the parameter estimate vector. The model subset selection cost function includes a D-optimality design criterion that maximises the determinant of the design matrix of the subset to ensure model robustness and to enable the model selection procedure to automatically terminate at a sparse model. The proposed approach is based on the forward OLS algorithm using the modified Gram-Schmidt procedure. Both the parameter tuning procedure, based on basis pursuit, and the model selection criterion, based on the D-optimality that is effective in ensuring model robustness, are integrated with the forward regression. As a consequence the inherent computational efficiency associated with the conventional forward OLS approach is maintained in the proposed algorithm. Examples demonstrate the effectiveness of the new approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The speed of convergence while training is an important consideration in the use of neural nets. The authors outline a new training algorithm which reduces both the number of iterations and training time required for convergence of multilayer perceptrons, compared to standard back-propagation and conjugate gradient descent algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Searching for the optimum tap-length that best balances the complexity and steady-state performance of an adaptive filter has attracted attention recently. Among existing algorithms that can be found in the literature, two of which, namely the segmented filter (SF) and gradient descent (GD) algorithms, are of particular interest as they can search for the optimum tap-length quickly. In this paper, at first, we carefully compare the SF and GD algorithms and show that the two algorithms are equivalent in performance under some constraints, but each has advantages/disadvantages relative to the other. Then, we propose an improved variable tap-length algorithm using the concept of the pseudo fractional tap-length (FT). Updating the tap-length with instantaneous errors in a style similar to that used in the stochastic gradient [or least mean squares (LMS)] algorithm, the proposed FT algorithm not only retains the advantages from both the SF and the GD algorithms but also has significantly less complexity than existing algorithms. Both performance analysis and numerical simulations are given to verify the new proposed algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A generalized asymptotic expansion in the far field for the problem of cylindrical wave reflection at a homogeneous impedance plane is derived. The expansion is shown to be uniformly valid over all angles of incidence and values of surface impedance, including the limiting cases of zero and infinite impedance. The technique used is a rigorous application of the modified steepest descent method of Ot

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A class identification algorithms is introduced for Gaussian process(GP)models.The fundamental approach is to propose a new kernel function which leads to a covariance matrix with low rank,a property that is consequently exploited for computational efficiency for both model parameter estimation and model predictions.The objective of either maximizing the marginal likelihood or the Kullback–Leibler (K–L) divergence between the estimated output probability density function(pdf)and the true pdf has been used as respective cost functions.For each cost function,an efficient coordinate descent algorithm is proposed to estimate the kernel parameters using a one dimensional derivative free search, and noise variance using a fast gradient descent algorithm. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As part of an international intercomparison project, a set of single column models (SCMs) and cloud-resolving models (CRMs) are run under the weak temperature gradient (WTG) method and the damped gravity wave (DGW) method. For each model, the implementation of the WTG or DGW method involves a simulated column which is coupled to a reference state defined with profiles obtained from the same model in radiative-convective equilibrium. The simulated column has the same surface conditions as the reference state and is initialized with profiles from the reference state. We performed systematic comparison of the behavior of different models under a consistent implementation of the WTG method and the DGW method and systematic comparison of the WTG and DGW methods in models with different physics and numerics. CRMs and SCMs produce a variety of behaviors under both WTG and DGW methods. Some of the models reproduce the reference state while others sustain a large-scale circulation which results in either substantially lower or higher precipitation compared to the value of the reference state. CRMs show a fairly linear relationship between precipitation and circulation strength. SCMs display a wider range of behaviors than CRMs. Some SCMs under the WTG method produce zero precipitation. Within an individual SCM, a DGW simulation and a corresponding WTG simulation can produce different signed circulation. When initialized with a dry troposphere, DGW simulations always result in a precipitating equilibrium state. The greatest sensitivities to the initial moisture conditions occur for multiple stable equilibria in some WTG simulations, corresponding to either a dry equilibrium state when initialized as dry or a precipitating equilibrium state when initialized as moist. Multiple equilibria are seen in more WTG simulations for higher SST. In some models, the existence of multiple equilibria is sensitive to some parameters in the WTG calculations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new sparse kernel density estimator with tunable kernels is introduced within a forward constrained regression framework whereby the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Based on the minimum integrated square error criterion, a recursive algorithm is developed to select significant kernels one at time, and the kernel width of the selected kernel is then tuned using the gradient descent algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing very sparse kernel density estimators with competitive accuracy to existing kernel density estimators.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work presents a procedure for electric load forecasting based on adaptive multilayer feedforward neural networks trained by the Backpropagation algorithm. The neural network architecture is formulated by two parameters, the scaling and translation of the postsynaptic functions at each node, and the use of the gradient-descendent method for the adjustment in an iterative way. Besides, the neural network also uses an adaptive process based on fuzzy logic to adjust the network training rate. This methodology provides an efficient modification of the neural network that results in faster convergence and more precise results, in comparison to the conventional formulation Backpropagation algorithm. The adapting of the training rate is effectuated using the information of the global error and global error variation. After finishing the training, the neural network is capable to forecast the electric load of 24 hours ahead. To illustrate the proposed methodology it is used data from a Brazilian Electric Company. © 2003 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pós-graduação em Matemática - IBILCE

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[EN] The aim of this work is to propose a model for computing the optical flow in a sequence of images. We introduce a new temporal regularizer that is suitable for large displacements. We propose to decouple the spatial and temporal regularizations to avoid an incongruous formulation. For the spatial regularization we use the Nagel-Enkelmann operator and a newly designed temporal regularization. Our model is based on an energy functional that yields a partial differential equation (PDE). This PDE is embedded into a multipyramidal strategy to recover large displacements. A gradient descent technique is applied at each scale to reach the minimum.