Deep Learning: Activation Functions — 10 Tricky questions
1. What are some common activation functions used in deep learning, and how do they differ from each other?
Answer: Some common activation functions used in deep learning include the sigmoid, hyperbolic tangent (tanh), rectified linear unit (ReLU), and softmax functions.
- The sigmoid and tanh functions are both smooth and saturate as their input gets very large or small.
- The ReLU function is non-saturating and has a simple derivative, making it computationally efficient.
- The softmax function is used for multiclass classification problems and converts a vector of inputs into a probability distribution.
2. ReLU is non-saturating. What does it mean? How is it useful?
Answer: ReLU (Rectified Linear Unit) is considered “non-saturating” because it does not saturate at higher input values. As the input gets larger, the output of the function remains proportional to the input, rather than saturating at a maximum value.
The non-saturating property of ReLU helps to prevent the vanishing gradient problem, which can occur when gradients become too small (like in case of tanh and sigmoid) and lead…