Machine Learning (ML) Monitoring and Observability 101

Bigabid’s Machine Learning (ML) Monitoring and Observability

As we depend more and more on ML (machine learning) in big business, we are beginning to see the effect of some big business ML failures. 9-figure ML-induced losses are becoming a trend like the recent Unity 110 M loss, Zillow’s recent ML failures, and many more. Considering this disturbing trend, we thought it pertinent to share how Bigabid approaches ML data quality, real-time and near-real-time monitoring, and performance by segment.

ML Monitoring & ML Observability

Data Quality

Bigabid focuses on building internal tools for tasks that occur often and for those where the cost of making a mistake is high. Checking the quality of our training data is one of those things.

We’ve created internal costume tools to analyze data trends over time. The first is to catch data issues in the data e.g. a period of missing values in a feature, an anomaly in the distribution of a feature, etc.

We’ve also created a generic and very useful tool that we internally call our “comparison tool”. It does exactly that, it compares two Pandas DataFrames and uses ML techniques to tell you where they differ from one another. In the world of data quality, we use this tool to compare, for example, two different months in our data to check if a significant, unpredicted difference pops up. Any change in audience targeting, feature distributions, Null values, etc. will be immediately reported, which saves us weeks of ML debugging (which everyone knows is very difficult).

Some good news! Bigabid will be releasing this tool to help the ML community combat data quality issues, as well as many more uses such as A/B test validation, analysis of the data to extract deep insights, etc.

Real-Time and Near Real-Time ML Monitoring

For real-time monitoring, Bigabid uses multiple methods. We partner with a leading vendor in the ML monitoring space as well as use our own internal tool. Each helps us look at different issues by floating them to the surface.

From our ML monitoring partner, Model Performance Management (MPM) is defined as, “The foundation of model/MLOps, providing continuous visibility into your production ML, understanding why predictions are made, and enabling teams with actionable insights to refine and react to changes to improve your models. MPM is reliant not only on metrics but also on how well a model can be explained when something eventually goes wrong”. We use this partner mainly for real-time feedback on production data issues e.g. drifts, ETL issues that result in erroneous features and a decrease in model performance (in terms of the loss function), and different health/operation metrics.

Bigabid’s proprietary solution is an internal monitoring dashboard. Every domain in ML is different and every industry uses ML differently. We’ve found that in addition to the more “generic” ML monitoring tools there is always a need for a more tailored solution to monitor both the business impact of our models and as a costume drill down into different components in the ML architecture. We run this tool in batch with a costume ETL and can be both viewed in a tableau dashboard as well as a Jupyter notebook for further drill-down

Performance by Segment

Tracking performance by segment enables you to get a deep understanding of the model quality on specific slices. It helps you find critical areas like where the ML model makes mistakes and where it performs the best.

For Bigabid, segments might be different clients, GEOs, OSs, etc. We incorporate the segmented view of the models’ performance into everything we do and for that, you’ve guessed it, we also developed an internal tool.

We use this tool in two main cases, research and pre-deployment. In research, we always want to keep an eye on the impact of the innovation we are researching, whether that’s a new feature, a new model, some change in the architecture, etc. for ALL of the segments in our data. For example, a change might prove to be beneficial on the test set as an aggregate yet damage the performance of one specific segment in the data. It’s crucial to be mindful of this issue and deploy with that in mind. You never want the results in the production to surprise you, especially in the ML world where debugging is a long and tedious process.

Final Thoughts on ML Monitoring & ML Observability

These best practices have brought Bigabid tremendous value over the years. We find them to be a gift that keeps on giving. They have saved us more times than we could count. We consider them now to be best practices (could say a must) for every ML team. There is no one silver bullet that will catch all the problems in the data and production but a collection of out-of-the-box tools, costume tools, and processes to be implemented in the team’s day-to-day. Only a combination of these tools will be effective at eliminating, as much as possible, data issues and ML bugs. Our comparison tool will be released in open-source soon with the hope of helping the community. We believe it’s a strong Swiss-Army knife type tool that will greatly benefit every team who chooses to adopt its usage.