Automate hyperparameter tuning in amazon sagemaker notebooks

4 min readDec 26, 2022

Although machine learning may seem a tempting option to have informed decisions or automate a manual process — it is not something which needs no supervision. Machine learning algorithms need to be recalibrated from time to time when model metrics fall down below a threshold — that may be because of deterioration of data quality, data spread or drastic change in training datasets. One of the most popular methods of making sure model is performing well within our threshold model metrics is executing hyperparameter tuning from time to time. Similar analogy for hyperparameter tuning would be tuning an instrument — when you tune a guitar doesn’t mean that its gonna be in tune in a month, it has to be re-tuned to make sure it sounds great!

Types of Hyperparameter tuning:

GridSearchCV : Tests all possible hyperparameter’s specified and gives the best performing model out of it.
RandomizedSearchCV: Randomly selects hyperparameter’s specified and gives the best performing model out of it.
Bayesian Optimization: In this method we learn from our previous hyperparameter combination runs to move the model towards more performance ( ex. minimizing ‘rmse’ in Regression problem or increasing ‘auc’ in binary classification problem)

In this article, we will explore on how to automate bayesian tuning job via amazon notebooks:

To run hyperparameter tuning in notebook we need train.csv, validation.csv and output file path in your s3 bucket. We will be using XGBoost algorithm image for out hyperparameter tuning so we need to make sure the files are in accordance to AWS instructions:

Below is how we specify path for our training, validation and output location to save model artifacts.

We will consider select hyperparameter’s to tune:

ETA: learning rate for xgboost classifier
alpha: L1 regularization term on weights
gamma: Minimum loss reduction required to make a further partition on a leaf node of the tree.
lambda: L2 regularization term on weights.
colsample_bylevel: subsample ratio of columns for each level.
colsample_bytree: subsample ratio of columns when constructing each tree
max_depth: Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.
num_round: The number of rounds to run the training.

Please follow Xgboost documentation in aws to select & understand all hyperparameter’s.

Below is how you set up tuning_job_config:

In the above image we can see that we selected “Bayesian” for tuning strategy and “validation:auc” as metric to focus on — since we want to maximize auc, we specify “Maximize” in type.

We also have to specify number of jobs we need to initiate and how many jobs will run in parallel.

“MaxNumberOfTrainingJobs”: 20
“MaxParallelTrainingJobs”: 4

Get model image from amazon: provide train_path, validation_path, instance_type, objective and maxruntimeinseconds.

After specifying all details in training image we can go ahead and start hyperparameter job:

Once we run the above code & hyperparameter job starts, we track the progress in Amazon console>Training>Hyperparameter tuning jobs

Once hyperparameter tuning job completes — we can check the hyperparameter’s for maximum auc:

The script we have written above can be automated to run at any frequency (monthly, bi-monthly, weekly) using lambda functions.

Let’s get coding ya’ll !

Follow my medium blog and add me on linked in — would love to connect with ya’ll.

Linked in: Omkar-gawade

Automate hyperparameter tuning in amazon sagemaker notebooks

Written by Omkar Gawade