In order to understand how my chosen strategy for finding new weights, namely an SGD, actually operates, I did a bit of analysis.
The graph below shows the flow of one single input, in a trivial, “linear” neural network, with only one neuron in each of its three layers (input, hidden, output).
The expected output value in this case is 1, illustrated by the dashed purple (?) line.
The cyan line shows how the output of the network fairly quickly converges towards the expected value.
The other lines show how the other components of the network, hidden layer, hidden weights, output weights and error change during the training iterations.
After only 100 iterations, the output is very close to the expected value (1), and the error has shrunk correspondingly to almost 0.
The hidden weight (hw) decreases moderately, while the output weight (ow) increases substantially.