This commit is contained in:
Miloslav Ciz 2022-04-09 20:51:52 +02:00
parent 2a3b06eb67
commit 3abdc93103
17 changed files with 160 additions and 24 deletions

View file

@ -80,6 +80,6 @@ And we get:
And so on until we get all the derivatives.
Once we have them, we multiply them all by some value (**learning rate**, a distance by which we move in the computed direction) and substract them from the current weights by which we perform the gradient descent and lower the total error.
Once we have them, we multiply them all by some value (**learning rate**, a distance by which we move in the computed direction) and subtract them from the current weights by which we perform the gradient descent and lower the total error.
Note that here we've only used one training sample, i.e. the error *E* was computed from the network against a single desired output. If more example are used in a single update step, they are usually somehow averaged.
Note that here we've only used one training sample, i.e. the error *E* was computed from the network against a single desired output. If more example are used in a single update step, they are usually somehow averaged.