592 resultados para Descent
Resumo:
A presente pesquisa tem como objetivo avaliar cefalometricamente, o espaço e po-sicionamento das coroas dos segundos e terceiros molares superiores permanentes não erupcionados na região da tuberosidade maxilar durante a distalização dos pri-meiros molares superiores, além de verificar a correlação entre estas duas variáveis. A amostra foi constituída de 38 telerradiografias em norma lateral direita, obtidas de 19 pacientes, jovens brasileiros, leucodermas e melanodermas, sendo 6 do sexo masculino e 13 do sexo feminino, com idade média de 9 anos 5 meses 13 dias. A metodologia constou inicialmente da divisão dos tempos (T1) inicial, e após a distali-zação do primeiro molar superior permanente em (T2) por um período médio de 10 meses e 23 dias. Para avaliação do espaço e angulação das coroas existente utili-zou-se uma Linha referencial intracraniana (Linha M) sendo esta demarcada, a partir de dois pontos, o ponto SE localizado na sutura esfenoetmoidal, e o ponto Pt locali-zado na parte anterior da fossa pterigopalatina. Esta linha referencial foi transferida até o ponto F, (Linha M ) ponto este localizado na região mais posterio-inferior da tuberosidade maxilar. O espaço avaliado compreendeu entre a Linha M , até a face distal do primeiro molar superior permanente. Na análise estatística usou-se o teste t (Teste t Student) , e na correlação entre espaço e angulação foi utilizado o coefi-ciente de correlação de Pearson. Concluímos que o espaço correspondente entre a distal dos primeiros molares superiores permanentes e extremidade da tuberosidade maxilar, na fase inicial e após a movimentação distal, não é suficiente para a erup-ção dos segundos e terceiros molares superiores permanentes. A angulação das coroas na fase inicial e após a movimentação distal posicionam-se com angulações mais para distal. Quanto à correlação das angulações das coroas dos segundos e terceiros molares superiores permanentes e o espaço para erupção verificamos que quanto maior a angulação das coroas para distal, menor os espaços oferecidos para a erupção.(AU)
Resumo:
We present an analytic solution to the problem of on-line gradient-descent learning for two-layer neural networks with an arbitrary number of hidden units in both teacher and student networks. The technique, demonstrated here for the case of adaptive input-to-hidden weights, becomes exact as the dimensionality of the input space increases.
Resumo:
An adaptive back-propagation algorithm is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, both numerical studies and a rigorous analysis show that the adaptive back-propagation method results in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Resumo:
We study the effect of two types of noise, data noise and model noise, in an on-line gradient-descent learning scenario for general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labeled by a two-layer teacher network with an arbitrary number of hidden units. Data is then corrupted by Gaussian noise affecting either the output or the model itself. We examine the effect of both types of noise on the evolution of order parameters and the generalization error in various phases of the learning process.
Resumo:
We complement recent advances in thermodynamic limit analyses of mean on-line gradient descent learning dynamics in multi-layer networks by calculating fluctuations possessed by finite dimensional systems. Fluctuations from the mean dynamics are largest at the onset of specialisation as student hidden unit weight vectors begin to imitate specific teacher vectors, increasing with the degree of symmetry of the initial conditions. In light of this, we include a term to stimulate asymmetry in the learning process, which typically also leads to a significant decrease in training time.
Resumo:
We study the effect of regularization in an on-line gradient-descent learning scenario for a general two-layer student network with an arbitrary number of hidden units. Training examples are randomly drawn input vectors labelled by a two-layer teacher network with an arbitrary number of hidden units which may be corrupted by Gaussian output noise. We examine the effect of weight decay regularization on the dynamical evolution of the order parameters and generalization error in various phases of the learning process, in both noiseless and noisy scenarios.
Resumo:
We present a framework for calculating globally optimal parameters, within a given time frame, for on-line learning in multilayer neural networks. We demonstrate the capability of this method by computing optimal learning rates in typical learning scenarios. A similar treatment allows one to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule as well as to compare different training methods.
Resumo:
The influence of biases on the learning dynamics of a two-layer neural network, a normalized soft-committee machine, is studied for on-line gradient descent learning. Within a statistical mechanics framework, numerical studies show that the inclusion of adjustable biases dramatically alters the learning dynamics found previously. The symmetric phase which has often been predominant in the original model all but disappears for a non-degenerate bias task. The extended model furthermore exhibits a much richer dynamical behavior, e.g. attractive suboptimal symmetric phases even for realizable cases and noiseless data.
Resumo:
An adaptive back-propagation algorithm parameterized by an inverse temperature 1/T is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, we analyse these learning algorithms in both the symmetric and the convergence phase for finite learning rates in the case of uncorrelated teachers of similar but arbitrary length T. These analyses show that adaptive back-propagation results generally in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.
Resumo:
An analytic investigation of the average case learning and generalization properties of Radial Basis Function Networks (RBFs) is presented, utilising on-line gradient descent as the learning rule. The analytic method employed allows both the calculation of generalization error and the examination of the internal dynamics of the network. The generalization error and internal dynamics are then used to examine the role of the learning rate and the specialization of the hidden units, which gives insight into decreasing the time required for training. The realizable and over-realizable cases are studied in detail; the phase of learning in which the hidden units are unspecialized (symmetric phase) and the phase in which asymptotic convergence occurs are analyzed, and their typical properties found. Finally, simulations are performed which strongly confirm the analytic results.
Resumo:
A method for calculating the globally optimal learning rate in on-line gradient-descent training of multilayer neural networks is presented. The method is based on a variational approach which maximizes the decrease in generalization error over a given time frame. We demonstrate the method by computing optimal learning rates in typical learning scenarios. The method can also be employed when different learning rates are allowed for different parameter vectors as well as to determine the relevance of related training algorithms based on modifications to the basic gradient descent rule.
Resumo:
We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer neural networks, using the methods of statistical mechanics. We first consider on-line Newton's method, which is known to provide optimal asymptotic performance. We determine the asymptotic generalization error decay for a soft committee machine, which is shown to compare favourably with the result for standard gradient descent. Matrix momentum provides a practical approximation to this method by allowing an efficient inversion of the Hessian. We consider an idealized matrix momentum algorithm which requires access to the Hessian and find close correspondence with the dynamics of on-line Newton's method. In practice, the Hessian will not be known on-line and we therefore consider matrix momentum using a single example approximation to the Hessian. In this case good asymptotic performance may still be achieved, but the algorithm is now sensitive to parameter choice because of noise in the Hessian estimate. On-line Newton's method is not appropriate during the transient learning phase, since a suboptimal unstable fixed point of the gradient descent dynamics becomes stable for this algorithm. A principled alternative is to use Amari's natural gradient learning algorithm and we show how this method provides a significant reduction in learning time when compared to gradient descent, while retaining the asymptotic performance of on-line Newton's method.
Resumo:
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical mechanics framework which is appropriate for large input dimension. We find significant improvement over standard gradient descent in both the transient and asymptotic phases of learning.
Resumo:
Natural gradient learning is an efficient and principled method for improving on-line learning. In practical applications there will be an increased cost required in estimating and inverting the Fisher information matrix. We propose to use the matrix momentum algorithm in order to carry out efficient inversion and study the efficacy of a single step estimation of the Fisher information matrix. We analyse the proposed algorithm in a two-layer network, using a statistical mechanics framework which allows us to describe analytically the learning dynamics, and compare performance with true natural gradient learning and standard gradient descent.
Resumo:
In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition.