Deep Learning: Guidelines for model optimization and tuning

a rough outline

Rahul S


A neural network model is represented by a set of parameters and hyperparameters. The parameters include the weights and biases of all nodes, and the hyper-parameters include a number of levers like layers, nodes in a layer, activation functions, cost functions, learning rate, and optimizers.

Training neural model means determining the right values for these parameters and hyperparameters in such a way that it maximizes the accuracy of predictions for the use case.

Usually, we start with a network architecture created with intuition. Weights and biases are also initialized to random values–or with some statistical method. Then we repeat iterations of applying weights and biases to the inputs and computing the error. Based on the error found, the weights and biases are adjusted so the error gets reduced. The process of adjusting weights and biases is an iterative back-and-forth process that goes on until the desired levels of performance are achieved.

You should read the following article to have an intuitive understanding of how a neural network learns:

We also fine tune the network hyper parameters to improve training speed and reduce iterations. Finally, we save the model as represented by its parameters and hyperparameters and use it for predictions.

We need to tune our models (i.e., network hyperparameters) for both efficiency and effectiveness. Optimization can be focused on both inference and training goals.


- better accuracy.
- lower inference costs.
- smaller model sizes so they can be effectively stored on disk and loaded into memory.
- less inference time.
- less resources utilization.

The requirements for better accuracy and lower costs conflict with each other. Better accuracy usually means higher costs. So a balance needs to be achieved so we get the desired outcomes at affordable costs.