Model Monitoring and Logging

Why Monitoring Matters

Monitoring not only include monitoring of our ML models, but also the monitoring of the systems and infrastructure which are included in our entire product or service such as Databases and web servers

"An ounce of prevention is worth a pound of cure" — Benjamin Franklin

Immediate Data Skews
- Training data is too old, not representative of live data
Model Staleness
- Environment shifts
- Consumer behaviour
- Adversarial scenarios
Negative Feedback Loops: when you train your models on data collected in production. If that data is biased/corrupted in any way, then the model trained on that data will also perform poorlly.

Why is ML monitoring different?

Unlike a pure software system, there are two additional components to consider in an ML system, the data and the model. Unlike in traditional software systems, the accuracy of an ML system depends on how well the model reflects the world it is meant to model which in turn depends on the data used for training and on the data that it receives while serving requests.

Code and config also take on additional complexity and sensitivity in an ML system due to two aspects:

Entanglement: Refers to the issue where changing anything, changes everything
Configuration: Model hyperparameters, versions and features are often controlled in a system config and the slightest error here can cause radically different model behavior that won't be picked by traditional software tests.

Observability in ML

Observability measures how well the internal states of a system can be inferred by knowing the inputs and outputs.

For ML, that means monitoring and analyzing the prediction requests and the generated predictions from your models.

Observability comes from control system theory where observability and controllability are closely linked.

i.e., controlling the accuracy of results overall usually across different versions of the model, requires observability.

Complexity of observing modern systems

Modern systems can make observability difficult
- Cloud-based systems
- Containerized infrastructure
- Distributed systems
- Microservices

Deep observability for ML

Not only top-level metrics
- Data slices provide a way to analyze different groups of people or different type of conditions
Domain knowledge is important for observability
TensorFlow Model Analysis (TFMA)
Both supervised and unsupervised analysis

Goals of ML observability

The main goal here in the context of observability is to prevent or act upon system failures.

Observations need to provide alert when a failure happens and ideally provide recommended actions to bring the system back to normal behavior.

Alertable
- Metrics and thresholds designed to make failures obvious
Actionable
- Root cause clearly identified

Monitoring Targets in ML

Basics: Input and output monitoring

Model input distribution

Measure high level statistics on slices of data relevant to domain
Model prediction distribution
Model versions
Input/prediction correlation

Collecting all the context information is often not practical, as the amount of data to process and store could be very large, so it's important the most relevant context and try to gather that information.

Logging for ML Monitoring

A log is an immutable, time stamped record of discrete events that happened over time for the system along with additional information.

Steps for building observability

Start with the out-of-the-box logs, metrics and dashboards
Add agents to collect additional logs and metrics
Add logs-based metrics and alerting to create your own metrics and alerts
Use aggregated sinks and workspaces to centralize your logs and monitoring

Tools for building observability

Google Cloud Monitoring
Amazon CloudWatch
Azure Monitor

Logging - Advantages

Easy to generate
Great when it comes to providing valuable insight
Focus on specific events

Logging - Disadvantages

Excessive logging can impact system performance
Aggregation operations on logs can be expensive (i.e., treat logs-based alerts with caution)
Setting up & maintaining tooling carries with it a significant operational cost

Logging in Machine Learning

Key areas: Use logs to keep track of the model inputs and predictions

Input red flags:

A feature becoming unavailable
Notable shifts in the distribution
Patterns specific to your model

Storing log data for analysis

Basic log storage is often unstructured
Parsing and storing log data in a queryable format enables analysis
- Extracting values to generate distributions and statistics
- Associating events with timestamps
- Identifying the systems
Enables automated reporting, dashboards, and alerting

New Training Data

Prediction requests form new training datasets
For supervised learning, labels are required
- Direct labeling
- Manual labeling
- Active learning
- Weak supervision

Tracing for ML systems

Tracing focuses on monitoring and understanding system performance, especially for microservice-based applications.

In monolithic systems, it's relatively easy to collect diagnostic data form different parts of a system. All modules might even run within one process and share common resources for logging.

Solving this problem becomes even more difficult if your services are running as separate processes in a distributed system. We can't depend on the traditional approaches that help diagnose monolithic systems. We somehow need to know the fine grain information of what's going on inside each service.

Tools for building observability

Sequence and parallelism of service requests
Distributed tracing
- Dapper
- Zipkin
- Jaeger

Dapper-Style Tracing

In service based architectures, Dapper style tracing works by propagating tracing data between services.

Each service annotate the trace with additional data and passes the tracing header to other services until the final request completes

Services are responsible for uploading their traces to a tracing back-end. The tracing backend then puts related latency data together like pieces of a puzzle.

Each trace is a call tree, beginning with the entry point of a request and ending with the server's response including all of the RPCs along the way. Each trace consists of small units called spans.

Monitoring Machine Learning Models in Production

Check this resource out to learn more about ML monitoring and logging.

Model Decay

What is Model Decay?

Model Decay

Production ML models often operate in a dynamic environments
The ground truth in dynamic environments changes
If the model is static and does not change, then it gradually moves farther and farther away from the ground truth

Two main causes of model drift:

Data Drift
Concept Drift

Data Drift (aka Feature Drift)

Statistical properties of input changes
Trained model is not relevant for changed data
For e.g., distribution of demographic data like age might change over time.

Concept Drift

Relationship between features and labels changes
The very meaning of what you are trying to predict changes
Prediction drift and label drift are similar

Detecting Drift on Time

Drift creeps into the system slowly with time
If it goes undetected, model accuracy suffers
Important to monitor and detect drift early

Model Decay Detection

Detecting Concept and Data Drift

Log Predictions (Full Requests and Reponses)

Incoming prediction requests and generated prediction should be logged
If possible log the ground truth that should have been predicted
- Can be used as labels for new training data
At a minimum log data in prediction request
- This data can be analyzed using unsupervised statistical methods to detect data drift that will cause model decay

Detecting Drift

Detected by observing the statistical properties of logged data, model predictions, and possible ground truth.
Deploy dashboard that plot statistical properties to observe how they change over time
Use specialized libraries for detecting drift
- TensorFlow Data Validation (TFDV)
- Scikit-multiflow library

Continuous Evaluation and Labelling in Vertex Prediction

Vertex Prediction offers continuous evaluation
Vertex Labelling Service can be used to assign ground truth labels to prediction input for retraining.
Azure, AWS, and other cloud providers provide similar services.

Ways to Mitigate Model Decay

When you've detected model decay:

At the minimum operational and business stakeholders should be notified of the decay
Take steps to bring model back to acceptable performance

Steps in Mitigating Model Decay

What if Drift is Detected?
- If possible, determine the portion of your training set that is still correct using unsupervised methods, such as clustering or statistical methods that look at divergence, KL divergence, JS divergence, KS test.
- Keep the good data, discard the bad, and add new data OR
- Discard data collected before a certain date and add new data OR
- Create an entirely new training set from new data

Fine Tune, or Start Over?

you can either continue training your model, fine tuning from the last checkpoint using new data OR
Start over, reinitialize your model, and completely retrain it
Either approach is valid, so it really depends on results
- How much new labelled data do you have?
- How far has it drifted?
Ideally Try both and compare the results

Model Re-Training Policy

It's usually a good idea to establish policies around when you're going to retrain your model, well it depends.

Automated Model Retraining

Redesign Data Processing Steps and Model Architecture

When model performance decays beyond an acceptable threshold you might have to consider redesigning your entire pipeline
Re-think feature engineering, feature-selection
You may have to train your model from scrath
Investigate on alternative architectures
Addressing Model Decay

Addressing Model Decay

Check this blog out to figure out best retraining strategies toi prevent model decay.

Responsible AI

Responsible AI Practices

Development of AI Creates new opportunities to improve the lives of people around the world
- Business, healthcare, education, etc
But it also Raises new questions about implementing responsible practices
- Fairness, interpretability, privacy, and security
- Far from solved, active areas of research and development

Human-Centered Desing

Actual users's experience is essential

Design your features with appropriate disclosures built-in
Consider augmentation and assistance
- Offering multiple suggestions instead of one right answer
Model potential adverse feedback early in the design process
Engage with a diverse set of users and use-case scenarios

Identify Multiple Metrics

Using several metrics help you understand the tradeoffs
- Feedback from user suverys
- Quantiles that track overall system performance
- False positive and false negative sliced across subgroups
Metrics must be appropriate for the context and goals of your system

Analyze your raw data carefully

For sensitive raw data, respect privacy
- Compute aggregate, annonymized summaries
Does your data reflect your users?
- e.g., will be used for all ages, but all data from senior citizens
Imperfect proxy labels?
- Relationships between the labels and actual targets
Responsible AI

Responsible AI

New technologies always bring new challenges. Ensuring that your applications adhere to responsible AI is a must. Please read this resource to keep yourself updated with this fascinating active research subject.

Legal Requirements for Secure and Private AI

Legal Implications of Data Security and Privacy

Companies must comply with data privacy protection laws in regions where they operate

In Europe for example, you need to comply with GDPR, General Data Protection Regulation.

In California, with CCPA, California Consumer Privacy Act

Regulation in EU law on data protection and privacy in the European Union (EU) and the European Economic Area (EEA)
Give control to individuals over their data
Companies should protect the data of employees
When the data processing is based on consent, the data subject has the right to revoke their consent at any time.

California Consumer Privacy Act (CCPA)

Similar to GDPR
Intended to enhance privacy rights and consumer protection for residents of California
User has the right to know what personal data is being collected about them, whether the personal data is sold or disclosed, and to whom
User can access the personal data, block the sale of their data, and request a business to delete their data.

Security and Privacy Harms from ML Models

Defenses

Cryptography

Privacy-enhancing tools (like SMPC and FHE) should be considered to securely train supervised machine learning models
Users can send encrypted prediction requests while preserving the confidentiality of the model
Protects confidentiality of the training data

Differential Privacy

Roughly speaking, a model is differentially private if an attacker seeing its predictions cannot tell if a particular user's information was included in the training data.

System for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.

There are three different approaches to implement differential privacy:

DP-SGD (Differentially Private Stochastic Gradient Descent)
PATE (Private Aggregation of Teacher Ensembles)
CaPC (Confidential and Private Collaborative learning)

Differentially-Private Stochastic Gradient Descent (DP-SGD)

If an attacker is able to get a copy of a normally trained model, then they can use the weights to extract private information.

DP-SGD eliminates that possibility by applying differential privacy during model training.

Modifies the minibatch stochastic optimization process by adding noise.
Trained model retains differential privacy because of the post processing immunity property of differential privacy.
- Post-processing immunity is a fundamental property of a differential privacy.
- It means that regardless how you process the models predictions, you can't affect their privacy guarantees.

Private Aggregation of Teacher Ensembles (PATE)

Begins by dividing sensitive information into k partitions with no overlaps. It trains k models on that data separately as teacher models and then aggregate the result in an aggregate teacher model.
During the aggregation for the aggregate teacher, we will add noise to the output in a way that won't affect the resulting predictions.
For deployment, we will create a student model. To train the student model, we'll take unlabeled public data and feed it to aggregate teacher model, outputting a labeled data, which maintains privacy. This data is then used as the training set for the student model.
Discard everything on left side of diagram and deploy student model.

Confidential and Private Collaborative Learning (CaPC)

Enables models to collaborate while preserving the privacy of the underlying data
Integrates building blocks from cryptography and differential privacy to provide confidential and private collaborative learning.
Encrypts prediction requests using Homomorphic Encryption (HE)
Uses PATE to add noise to predictions for voting

GDPR and CCPA

Check the GDPR and CCPA websites out to learn more about its regulations and compliance.

Anonymization and Pseudonymisation

GDPR includes many regulations to preserve privacy of user data
Since introduction of GDPR, two terms have been discussed widely
1. Anonymization
2. Pseudonymisation

Data Anonymization

Removes Personally Identifiable Information (PII) form data sets.

Recital 26 of GDPR defines Data Anonymization

True data anonymization is :

Irreversible
Done in such a way that it is impossible to identify the person
Impossible to derive insights or discrete information, even by the party responsible for a anonymization

GDPR does not apply to data that has been anonymized

Pseudonymisation

GDPR Article 4(5) defines pseudonymisation as:

"...the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information"
The data is anonymized by switching the identifiers (like email or name) with an alias or pseudonym.

Spectrum of Privary prevention

What data should be Anonymized?

Any data that reveals the identify of a person, referred to as identifies
Identifiers applies to any natural or legal person, living or dead, including their dependents, ascendants, and descendants.
Included are other related persons, direct or through interaction
For example: Family names, patronyms, first names, maiden names, aliases, address, phone, bank account details, credit cards, IDs like SSN.

Right to be Forgotten

What is Right to Be Forgotten?

"The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay"

Recitals 65 and 66 and in Article 17 of the GDPR

Right to Rectification

"The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement."

Chapter 3, Art. 16 GDPR

Other Rights of the Data Subject

Chapter 3 defines a number of other rights of the data subject, including:

Art. 15 GDPR - Right of access by the data subject
Art. 18 GDPR - Right to restriction of processing
Art. 20 GDPR - Right to data portability
Art. 21 GDPR - Right to object

Implementing Right To Be Forgotten: Tracking Data

For a valid erasure claim

Company need to identify all of the information related to the content requested to be removed.
All of the associated metadata must also be erased
- e.g., Derived data, logs etc.

Forgetting Digital Memories

Issues with Hard Delete

Deleting records from a database can cause havoc
User data is often referenced in multiple tables
Deletion breaks the connections, which can be difficult in large, complex databases
Can break foreign keys
Anonymization keeps the records, and only anonymizes the fields containing PII.

Challenges in Implementing Right to Be Forgotten

Identifying if data privacy is violated
Organisational changes for enforcing GDPR
Deleting personal data from multiple back-ups

Course 4 Optional References

Machine Learning Modeling Pipelines in Production

This is a compilation of resources including URLs and papers appearing in lecture videos. If you wish to dive more deeply into the topics covered this week, feel free to check out these optional references.

Week 4 — Model monitoring, Logging & model decay, GDPR, and Privacy

Model Monitoring and Logging

Why Monitoring Matters

Why is ML monitoring different?

Observability in ML

Complexity of observing modern systems

Deep observability for ML

Goals of ML observability

Monitoring Targets in ML

Basics: Input and output monitoring

Logging for ML Monitoring

Steps for building observability

Tools for building observability

Logging - Advantages

Logging - Disadvantages

Logging in Machine Learning

Storing log data for analysis

New Training Data

Tracing for ML systems

Tools for building observability

Dapper-Style Tracing

Monitoring Machine Learning Models in Production

Model Decay

What is Model Decay?

Model Decay

Data Drift (aka Feature Drift)

Concept Drift

Detecting Drift on Time

Model Decay Detection

Detecting Concept and Data Drift

Detecting Drift

Continuous Evaluation and Labelling in Vertex Prediction

Ways to Mitigate Model Decay

Steps in Mitigating Model Decay

Fine Tune, or Start Over?

Model Re-Training Policy

Automated Model Retraining

Redesign Data Processing Steps and Model Architecture

Addressing Model Decay

GDPR and Privacy

Responsible AI

Responsible AI Practices

Human-Centered Desing

Identify Multiple Metrics

Analyze your raw data carefully

Responsible AI

Legal Requirements for Secure and Private AI

Legal Implications of Data Security and Privacy

General Data Protection Regulation (GDPR)

California Consumer Privacy Act (CCPA)

Security and Privacy Harms from ML Models

Defenses

Cryptography

Differential Privacy

Differentially-Private Stochastic Gradient Descent (DP-SGD)

Private Aggregation of Teacher Ensembles (PATE)

Confidential and Private Collaborative Learning (CaPC)

GDPR and CCPA

Anonymization and Pseudonymisation

Data Anonymization in GDPR

Data Anonymization

Pseudonymisation

Spectrum of Privary prevention

What data should be Anonymized?

Right to be Forgotten

What is Right to Be Forgotten?

Right to Rectification

Other Rights of the Data Subject

Implementing Right To Be Forgotten: Tracking Data

Forgetting Digital Memories

Issues with Hard Delete

Challenges in Implementing Right to Be Forgotten

Course 4 Optional References

Machine Learning Modeling Pipelines in Production

Week 1. Model Serving: introduction

NoSQL Databases:

MobileNets:

Serving Systems:

Week 2. Model Serving: patterns and infrastructure

Model Serving Architecture: