# Baby introduction to Bayesian Hyperparameter Optimization for Machine Learning

Hyperparameters, in contrast to model parameters, are set by the ML engineer before training. Hyperparameter optimization is about finding the hyperparameters of an algorithm that give the best performance. It is represented in equation form as:

Now, evaluating the objective function, specially when we consider K-fold validation as an intermediary step, is extremely expensive. Methods like Grid and random search are inefficient as they are completely *uninformed *by past evaluations and spend a significant amount of time evaluating “bad” hyperparameters. In Bayesian Optimization, we try to use our past knowledge to better the approximation, and thus reduce the compute time.

It keeps track of past evaluation results use it to form a probabilistic model mapping hyperparameters to a probability of a score on the objective function. This probabilistic model is called a “surrogate” for the objective function and is represented as p(y | x).

This surrogate is much easier to optimize than the objective function. So, in Bayesian optimization, we

1) Build a surrogate probability model of the objective function

2) Evaluate the hyperparameters that perform best on the surrogate