Week 1 — Hyperparameter tuning: searching for the best architecture
Overview
This course is focuses on tools and techniques to effectively manage modeling resources and best serve batch and realtime inference requests.
 Effective search strategies for the best model that will scale for various serving needs
 Constraining model complexity and hardware requirements
 Optimize and manage compute storage and IO needs
 We'll be going through TensorFlow Model Analysis (TFMA)
Neural Architecture Search
 Neural architecture search (NAS) is a technique for automating the design of artificial neural networks
 It helps finding the optimal architecture
 This is a search over a huge space
 AutoML is an algorithm to automate this search
Types of parameters in ML models
 Trainable parameters
 Learned by the algorithm during training
 e.g., weights of a neural network

Hyperparameters
 Set before launching the learning process
 not updated in each training step
Hyperparameters are of two types:
 Model hyperparameters which influence model selection such as the number and width of hidden layers
 Algorithm hyperparameters which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier.
 e.g., learning rate or the number of units in a dense layer
Manual hyperparameter tuning is not scalable
The process of finding the optimal set of hyperparameters is called hyperparameter tuning or hypertuning.
 Hyperparameters can be numerous even for small models
 e.g., shallow DNN
 Architecture choices
 activation functions
 Weight initialization strategy
 Optimization hyperparameters such as learning rate, stop condition
 Tuning them manually can be a real brain teaser
 helps boost model performance.
Automating hyperparameter tuning with Keras Tuner
 Automation is key.
 Keras Tunes:
 Hyperparameter tuning with TensorFlow 2.0
 Many methods available>
Keras Autotuner Demo
 Do the model need more or less hidden units to perform well?
 How does model size affect the convergence spped?
 Is there any trade off between convergence speed, model size and accuracy?
 Search automation is the natural path to take
 keras tuner built in search functionality will help!
 Keras Tuner has four tuners available with builtin strategies:
RandomSearch
Hyperband
BayesianOptimization
Sklearn
import keras_tuner as kt
def model_builder(hp):
'''
Builds the model and sets up the hyperparameters to tune.
Args:
hp  Keras tuner object
Returns:
model with hyperparameters to tune
'''
# The model you set up for hypertuning is called a hypermodel.
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
hp_units = hp.Int('units', min_value=16, max_value=512, step=16)
model.add(keras.layers.Dense(units=hp_units, activation='relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(keras.layer.Dense(10))
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
return model
# Keras tuner support multiple strategies, one we're using is Hyperband strategy
# Keras Tuner has four tuners available with builtin strategies  RandomSearch, Hyperband, BayesianOptimization, and Sklearn
tuner = kt.Hyperband(
model_builder,
objective='val_accuracy',
max_epochs=10,
factor=3,
directory='my_dir',
project_name='intro_to_kt')
# callback configuration
stop_early = tf.keras.callbacks.EarlyStopping(
monior='val_loss', patience=5)
tuner.search(
X_train, y_train,
epochs=50,
validation_split=0.2,
callbacks=[stop_early])
Search Output
Then, we can set the hidden units to be 48
best_hps=tuner.get_best_hyperparameters()[0]
h_model = tuner.hypermodel.build(best_hps)
h_model.fit(X_train, y_train, epochs=NUM_EPOCHS, validation_split=0.2)
AutoML — Intro to AutoML (Automated Machine Learning)
It is aimed at developers with very little experience in ML to make use of ML model and techniques by trying to automate entire workflow endtoend.
Neural Architecture Search (❤️ of AutoML)
The process of automating architecture engineering is strictly called NAS.
Three main parts:
 Search space: Defines the range of architecture which can be represented.
 Search strategy: Defines how we explore the search space.
 Performance estimation strategy: Helps measure the comparing the performance of various architectures.
Understanding Search Spaces
Types of Search Space:
 Macro
 Micro
Node: A node is a layer in a neural network.
An arrow from layer $\text{L}_i$ to $\text{L}_j$ indicates the layer $\text{L}_j$receives the output of $\text{L}_i$as input.
Macro Architecture Search Space
A macro search space contains the individual layers and connection types of neural network.
 A NAS searches within that space for the best model, building the model layer by layer.
In a chainstructured Neural Network Architecture (NNA), space is parametrized as:
 The operation every layer can execute
 Hyperparameters associated with the operation
 A number of n sequentially fullyconnected layers
Micro Architecture Search Space
In a micro search space, NAS build a neural network from cells where each cell is a smaller network.
 Cells are stacked to produce the final network.
Search Strategies
 Find the architecture that produces the best performance
A few search strategies
 Grid Search
 Random Search
 Bayesian Optimization
 Evolutionary Algorithms
 Reinforcement Learning
Grid Search & Random Search
Grid Search
 Exhaustive search approach on fixed grid values
Random Search:
 Select the next options randomly within the search space.
Both Suitable for small search space.
Both quickly fail with growing size of search space
Bayesian Optimization
 Assumes that a specific probability distribution, is underlying the performance.
 Tested architectures constrain the probability distribution and guide the selection of the next option.
 This way, promising architectures can be stochastically determined and tested.
Evolutionary Methods
Reinforcement Learning
 Agents goal is to maximise a reward.
 The available options are selected from the search space.
 The performance estimation strategy determines the reward
Reinforcement Learning for NAS
 A NN can also be specified by a variable length string.
 Hence RNN can be used generate that string, referred as controller.
 After training model on real data called, child network, we measure the accuracy on validation set.
 This accuracy then determines the reinforcement learning reward.
Neural Architecture Search
Neural Architecture Search
If you wish to dive more deeply into neural architecture search , feel free to check out these optional references. You won’t have to read these to complete this week’s practice quizzes.
Measuring AutoML Efficacy
Performance Estimation Strategy
The search strategies in neural architecture search need to estimate the performance of generated architectures, so that they can in turn generate even better performing architectures.
The simplest approach is to measure validation accuracy...
Strategies to Reduce the Cost
 Lower fidelity estimates
 Learning Curve Extrapolation
 Weight Inheritance/ Network Morphisms
Lower fidelity estiamtes
Lower fidelity or precision estimates try to reduce training time by reframing the problem make is easier to solve.
This is done either by:
 Training on subset of data
 lower resolution images
 Fewer filters per layer and fewer cells
Learning Curve Extrapolation
Based on the assumption that you have mechanisms to predict the learning curve reliably.
 Extrapolates based on initial learning
 Removes poor performers
 Progressive NAS, uses a similar approach by training a surrogate model and using it to predict the performance using architectural properties.
Weight Inheritance/Network Morphisms
 Initialize weights of new architectures based on previously trained architectures.
 Similar to transfer learning
 Uses Network Morphism
 Modifies the architecture without changing the underlying function.
 New network inherits knowledge from parent network
 Computational speed up: only a few days of GPU usage
 Network size not inherently bounded
AutoML on the Cloud
Amazon SageMaker Autopilot
Automatically trains and tunes the model for classification or regression based on your data.
Microsoft Azure AutoML
Google Cloud AutoML
How do these Cloud offerings perform AutoML?
 We don't know (or can't say) and they're not about to tell us.
 The underlying algorithms will be similar to what we've learned.
 The algorithms will evolve with the state of the art
AutoML
AutoML
If you wish to dive more deeply into AutoMLs, feel free to check out these cloudbased tools. You won’t have to read these to complete this week’s practice quizzes.