Week 4 — Model monitoring, Logging & model decay, GDPR, and Privacy
Model Monitoring and Logging
Why Monitoring Matters
Monitoring not only include monitoring of our ML models, but also the monitoring of the systems and infrastructure which are included in our entire product or service such as Databases and web servers
"An ounce of prevention is worth a pound of cure" — Benjamin Franklin
- Immediate Data Skews
- Training data is too old, not representative of live data
- Model Staleness
- Environment shifts
- Consumer behaviour
- Adversarial scenarios
- Negative Feedback Loops: when you train your models on data collected in production. If that data is biased/corrupted in any way, then the model trained on that data will also perform poorlly.
Why is ML monitoring different?
Unlike a pure software system, there are two additional components to consider in an ML system, the data and the model. Unlike in traditional software systems, the accuracy of an ML system depends on how well the model reflects the world it is meant to model which in turn depends on the data used for training and on the data that it receives while serving requests.
Code and config also take on additional complexity and sensitivity in an ML system due to two aspects:
- Entanglement: Refers to the issue where changing anything, changes everything
- Configuration: Model hyperparameters, versions and features are often controlled in a system config and the slightest error here can cause radically different model behavior that won't be picked by traditional software tests.
Observability in ML
Observability measures how well the internal states of a system can be inferred by knowing the inputs and outputs.
For ML, that means monitoring and analyzing the prediction requests and the generated predictions from your models.
Observability comes from control system theory where observability and controllability are closely linked.
i.e., controlling the accuracy of results overall usually across different versions of the model, requires observability.
Complexity of observing modern systems
- Modern systems can make observability difficult
- Cloud-based systems
- Containerized infrastructure
- Distributed systems
- Microservices
Deep observability for ML
- Not only top-level metrics
- Data slices provide a way to analyze different groups of people or different type of conditions
- Domain knowledge is important for observability
- TensorFlow Model Analysis (TFMA)
- Both supervised and unsupervised analysis
Goals of ML observability
The main goal here in the context of observability is to prevent or act upon system failures.
Observations need to provide alert when a failure happens and ideally provide recommended actions to bring the system back to normal behavior.
- Alertable
- Metrics and thresholds designed to make failures obvious
- Actionable
- Root cause clearly identified
Monitoring Targets in ML
Basics: Input and output monitoring
-
Model input distribution
Measure high level statistics on slices of data relevant to domain
-
Model prediction distribution
- Model versions
- Input/prediction correlation
Collecting all the context information is often not practical, as the amount of data to process and store could be very large, so it's important the most relevant context and try to gather that information.
Logging for ML Monitoring
A log is an immutable, time stamped record of discrete events that happened over time for the system along with additional information.
Steps for building observability
- Start with the out-of-the-box logs, metrics and dashboards
- Add agents to collect additional logs and metrics
- Add logs-based metrics and alerting to create your own metrics and alerts
- Use aggregated sinks and workspaces to centralize your logs and monitoring
Tools for building observability
- Google Cloud Monitoring
- Amazon CloudWatch
- Azure Monitor
Logging - Advantages
- Easy to generate
- Great when it comes to providing valuable insight
- Focus on specific events
Logging - Disadvantages
- Excessive logging can impact system performance
- Aggregation operations on logs can be expensive (i.e., treat logs-based alerts with caution)
- Setting up & maintaining tooling carries with it a significant operational cost
Logging in Machine Learning
Key areas: Use logs to keep track of the model inputs and predictions
Input red flags:
- A feature becoming unavailable
- Notable shifts in the distribution
- Patterns specific to your model
Storing log data for analysis
- Basic log storage is often unstructured
- Parsing and storing log data in a queryable format enables analysis
- Extracting values to generate distributions and statistics
- Associating events with timestamps
- Identifying the systems
- Enables automated reporting, dashboards, and alerting
New Training Data
- Prediction requests form new training datasets
- For supervised learning, labels are required
- Direct labeling
- Manual labeling
- Active learning
- Weak supervision
Tracing for ML systems
Tracing focuses on monitoring and understanding system performance, especially for microservice-based applications.
In monolithic systems, it's relatively easy to collect diagnostic data form different parts of a system. All modules might even run within one process and share common resources for logging.
Solving this problem becomes even more difficult if your services are running as separate processes in a distributed system. We can't depend on the traditional approaches that help diagnose monolithic systems. We somehow need to know the fine grain information of what's going on inside each service.
Tools for building observability
- Sequence and parallelism of service requests
- Distributed tracing
- Dapper
- Zipkin
- Jaeger
Dapper-Style Tracing
In service based architectures, Dapper style tracing works by propagating tracing data between services.
Each service annotate the trace with additional data and passes the tracing header to other services until the final request completes
Services are responsible for uploading their traces to a tracing back-end. The tracing backend then puts related latency data together like pieces of a puzzle.
Each trace is a call tree, beginning with the entry point of a request and ending with the server's response including all of the RPCs along the way. Each trace consists of small units called spans.
Monitoring Machine Learning Models in Production
Monitoring Machine Learning Models in Production
Check this resource out to learn more about ML monitoring and logging.
Model Decay
What is Model Decay?
Model Decay
- Production ML models often operate in a dynamic environments
- The ground truth in dynamic environments changes
- If the model is static and does not change, then it gradually moves farther and farther away from the ground truth
Two main causes of model drift:
- Data Drift
- Concept Drift
Data Drift (aka Feature Drift)
- Statistical properties of input changes
- Trained model is not relevant for changed data
- For e.g., distribution of demographic data like age might change over time.
Concept Drift
- Relationship between features and labels changes
- The very meaning of what you are trying to predict changes
- Prediction drift and label drift are similar
Detecting Drift on Time
- Drift creeps into the system slowly with time
- If it goes undetected, model accuracy suffers
- Important to monitor and detect drift early
Model Decay Detection
Detecting Concept and Data Drift
Log Predictions (Full Requests and Reponses)
- Incoming prediction requests and generated prediction should be logged
- If possible log the ground truth that should have been predicted
- Can be used as labels for new training data
- At a minimum log data in prediction request
- This data can be analyzed using unsupervised statistical methods to detect data drift that will cause model decay
Detecting Drift
- Detected by observing the statistical properties of logged data, model predictions, and possible ground truth.
- Deploy dashboard that plot statistical properties to observe how they change over time
- Use specialized libraries for detecting drift
- TensorFlow Data Validation (TFDV)
- Scikit-multiflow library
Continuous Evaluation and Labelling in Vertex Prediction
- Vertex Prediction offers continuous evaluation
- Vertex Labelling Service can be used to assign ground truth labels to prediction input for retraining.
- Azure, AWS, and other cloud providers provide similar services.
Ways to Mitigate Model Decay
When you've detected model decay:
- At the minimum operational and business stakeholders should be notified of the decay
- Take steps to bring model back to acceptable performance
Steps in Mitigating Model Decay
- What if Drift is Detected?
- If possible, determine the portion of your training set that is still correct using unsupervised methods, such as clustering or statistical methods that look at divergence, KL divergence, JS divergence, KS test.
- Keep the good data, discard the bad, and add new data OR
- Discard data collected before a certain date and add new data OR
- Create an entirely new training set from new data
Fine Tune, or Start Over?
- you can either continue training your model, fine tuning from the last checkpoint using new data OR
- Start over, reinitialize your model, and completely retrain it
- Either approach is valid, so it really depends on results
- How much new labelled data do you have?
- How far has it drifted?
- Ideally Try both and compare the results
Model Re-Training Policy
It's usually a good idea to establish policies around when you're going to retrain your model, well it depends.
Automated Model Retraining
Redesign Data Processing Steps and Model Architecture
- When model performance decays beyond an acceptable threshold you might have to consider redesigning your entire pipeline
- Re-think feature engineering, feature-selection
- You may have to train your model from scrath
- Investigate on alternative architectures
- Addressing Model Decay
Addressing Model Decay
Addressing Model Decay
Check this blog out to figure out best retraining strategies toi prevent model decay.
GDPR and Privacy
Responsible AI
Responsible AI Practices
- Development of AI Creates new opportunities to improve the lives of people around the world
- Business, healthcare, education, etc
- But it also Raises new questions about implementing responsible practices
- Fairness, interpretability, privacy, and security
- Far from solved, active areas of research and development
Human-Centered Desing
Actual users's experience is essential
- Design your features with appropriate disclosures built-in
- Consider augmentation and assistance
- Offering multiple suggestions instead of one right answer
- Model potential adverse feedback early in the design process
- Engage with a diverse set of users and use-case scenarios
Identify Multiple Metrics
- Using several metrics help you understand the tradeoffs
- Feedback from user suverys
- Quantiles that track overall system performance
- False positive and false negative sliced across subgroups
- Metrics must be appropriate for the context and goals of your system
Analyze your raw data carefully
- For sensitive raw data, respect privacy
- Compute aggregate, annonymized summaries
- Does your data reflect your users?
- e.g., will be used for all ages, but all data from senior citizens
- Imperfect proxy labels?
- Relationships between the labels and actual targets
- Responsible AI
Responsible AI
New technologies always bring new challenges. Ensuring that your applications adhere to responsible AI is a must. Please read this resource to keep yourself updated with this fascinating active research subject.
Legal Requirements for Secure and Private AI
Legal Implications of Data Security and Privacy
Companies must comply with data privacy protection laws in regions where they operate
In Europe for example, you need to comply with GDPR, General Data Protection Regulation.
In California, with CCPA, California Consumer Privacy Act
General Data Protection Regulation (GDPR)
- Regulation in EU law on data protection and privacy in the European Union (EU) and the European Economic Area (EEA)
- Give control to individuals over their data
- Companies should protect the data of employees
- When the data processing is based on consent, the data subject has the right to revoke their consent at any time.
California Consumer Privacy Act (CCPA)
- Similar to GDPR
- Intended to enhance privacy rights and consumer protection for residents of California
- User has the right to know what personal data is being collected about them, whether the personal data is sold or disclosed, and to whom
- User can access the personal data, block the sale of their data, and request a business to delete their data.
Security and Privacy Harms from ML Models
Defenses
Cryptography
- Privacy-enhancing tools (like SMPC and FHE) should be considered to securely train supervised machine learning models
- Users can send encrypted prediction requests while preserving the confidentiality of the model
- Protects confidentiality of the training data
Differential Privacy
Roughly speaking, a model is differentially private if an attacker seeing its predictions cannot tell if a particular user's information was included in the training data.
System for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.
There are three different approaches to implement differential privacy:
- DP-SGD (Differentially Private Stochastic Gradient Descent)
- PATE (Private Aggregation of Teacher Ensembles)
- CaPC (Confidential and Private Collaborative learning)
Differentially-Private Stochastic Gradient Descent (DP-SGD)
If an attacker is able to get a copy of a normally trained model, then they can use the weights to extract private information.
DP-SGD eliminates that possibility by applying differential privacy during model training.
- Modifies the minibatch stochastic optimization process by adding noise.
- Trained model retains differential privacy because of the post processing immunity property of differential privacy.
- Post-processing immunity is a fundamental property of a differential privacy.
- It means that regardless how you process the models predictions, you can't affect their privacy guarantees.
Private Aggregation of Teacher Ensembles (PATE)
- Begins by dividing sensitive information into k partitions with no overlaps. It trains k models on that data separately as teacher models and then aggregate the result in an aggregate teacher model.
- During the aggregation for the aggregate teacher, we will add noise to the output in a way that won't affect the resulting predictions.
- For deployment, we will create a student model. To train the student model, we'll take unlabeled public data and feed it to aggregate teacher model, outputting a labeled data, which maintains privacy. This data is then used as the training set for the student model.
-
Discard everything on left side of diagram and deploy student model.
Confidential and Private Collaborative Learning (CaPC)
- Enables models to collaborate while preserving the privacy of the underlying data
- Integrates building blocks from cryptography and differential privacy to provide confidential and private collaborative learning.
- Encrypts prediction requests using Homomorphic Encryption (HE)
- Uses PATE to add noise to predictions for voting
GDPR and CCPA
GDPR and CCPA
Check the GDPR and CCPA websites out to learn more about its regulations and compliance.
Anonymization and Pseudonymisation
Data Anonymization in GDPR
- GDPR includes many regulations to preserve privacy of user data
- Since introduction of GDPR, two terms have been discussed widely
- Anonymization
- Pseudonymisation
Data Anonymization
Removes Personally Identifiable Information (PII) form data sets.
Recital 26 of GDPR defines Data Anonymization
True data anonymization is :
- Irreversible
- Done in such a way that it is impossible to identify the person
- Impossible to derive insights or discrete information, even by the party responsible for a anonymization
GDPR does not apply to data that has been anonymized
Pseudonymisation
-
GDPR Article 4(5) defines pseudonymisation as:
"...the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information"
-
The data is anonymized by switching the identifiers (like email or name) with an alias or pseudonym.
Spectrum of Privary prevention
What data should be Anonymized?
- Any data that reveals the identify of a person, referred to as identifies
- Identifiers applies to any natural or legal person, living or dead, including their dependents, ascendants, and descendants.
- Included are other related persons, direct or through interaction
- For example: Family names, patronyms, first names, maiden names, aliases, address, phone, bank account details, credit cards, IDs like SSN.
Right to be Forgotten
What is Right to Be Forgotten?
"The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay"
- Recitals 65 and 66 and in Article 17 of the GDPR
Right to Rectification
"The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement."
- Chapter 3, Art. 16 GDPR
Other Rights of the Data Subject
Chapter 3 defines a number of other rights of the data subject, including:
- Art. 15 GDPR - Right of access by the data subject
- Art. 18 GDPR - Right to restriction of processing
- Art. 20 GDPR - Right to data portability
- Art. 21 GDPR - Right to object
Implementing Right To Be Forgotten: Tracking Data
For a valid erasure claim
- Company need to identify all of the information related to the content requested to be removed.
- All of the associated metadata must also be erased
- e.g., Derived data, logs etc.
Forgetting Digital Memories
Issues with Hard Delete
- Deleting records from a database can cause havoc
- User data is often referenced in multiple tables
- Deletion breaks the connections, which can be difficult in large, complex databases
- Can break foreign keys
- Anonymization keeps the records, and only anonymizes the fields containing PII.
Challenges in Implementing Right to Be Forgotten
- Identifying if data privacy is violated
- Organisational changes for enforcing GDPR
- Deleting personal data from multiple back-ups
Course 4 Optional References
Course 4 Optional References
Machine Learning Modeling Pipelines in Production
This is a compilation of resources including URLs and papers appearing in lecture videos. If you wish to dive more deeply into the topics covered this week, feel free to check out these optional references.
Week 1. Model Serving: introduction
NoSQL Databases:
MobileNets:
Serving Systems:
Week 2. Model Serving: patterns and infrastructure
Model Serving Architecture:
- Model Server Architecture
- TensorFlow Serving
- NVIDIA Triton Inference Server
- Torch Serve
- Kubeflow Serving
Scaling Infrastructure:
Online Inference:
Batch Processing with ETL:
Week 3. Model Management and Delivery
Experiment Tracking and Management:
Notebooks:
Tools for Data Versioning:
Tooling for Teams:
MLOps:
Orchestrated Workflows with TFX:
Continuous and Progressive Delivery:
- Progressive Delivery
- Continuous, Incremental, & Progressive Delivery
- Deployment Strategies
- Blue/Green Deployment
- A/B Testing