(1) VANISHING GRADIENT: One of the main drawbacks of the sigmoid activation function is that it saturates at high and low input values. This means that as the input to the sigmoid function becomes very large or very small, the output of the function becomes very close to either 0 or 1.
This can lead to the gradients becoming very small and can make it difficult to train deep learning networks using the sigmoid activation function. This is known as the “vanishing gradient” problem.
(2) NOT ZERO-CENTERED: Another drawback of the sigmoid activation function is that its output is not zero-centered. This means that the activations of the neurons in the network can be all positive or all negative, which can cause problems when using certain optimization algorithms, such as gradient descent.
(3) COMPUTATIONALLY INEFFICIENT: Finally, the sigmoid activation function is not as computationally efficient as other activation functions, such as the ReLU activation function. This can be a concern when training large deep learning networks.