Trustworthy Machine Learning - How to build TRUST in Machine Learning, the sane way

How to build TRUST in Machine Learning

Machine learning is becoming an integral part of every aspect of our lives. As these systems’ complexity grows, and they take many more decisions for us, we need to tackle the biggest barrier to their adoption ⚠️; “How can we trust THAT machine learning model?”.

Building trust in machine learning is tough. Loss of trust is possibly the biggest risk that a business can ever face ☠. Unfortunately, people tend to discuss this topic in a very superficial and buzzwordy manner.

In this post, I will present why it is difficult to build trust in machine learning projects. To gain the most business value from the model, we want stakeholders to trust it. We want to provide defensive mechanisms to avoid problems impacting stakeholders and to build developers’ trust in the product.

Why is it difficult to TRUST machine learning models?

Trust needs to be earned, and gaining trust is not easy especially when it comes to software. To gain that trust you want the software to work well, change according to your needs and not break doing so. There are entire industries that do “just that”. Understanding whether a code change may negatively affect the end user, and fixing it in time. A project with machine learning is even tougher for the following reasons:

  • Many moving parts 🚴🏻‍♀️— both the code and the data change in exotic ways. In some cases, it is even worse as you are the producer of the dataset.
  • Output is “unpredictable” 🎁— you can’t cover all the predictions your model will show your end users. This can be mitigated by explaining why the model provided such predictions.
  • Multiple departments involved🤼 — usually multiple roles have different parts because they require different sets of skills. From data ingestion, model development, model deployment, monitoring, and using the model. Communicating and working together effectively is not a trivial task.
  • Tough development cycle 😫 — to create a model you need a lot of data, enough computing resources, and enough time. This makes local development tougher. You may sample, use expensive computers, or use the cloud. But still, the feedback loop will be quite long. If that wasn’t enough, machine learning code is difficult to debug.
  • Machine learning systems are complex 🤯— they rarely meet users’ expectations in a regular way. Most of the time your algorithm works but not well enough. Even defining what is appropriate and what should be measured is a nontrivial task. This becomes even tougher as there are not enough best practices out there, and even those are not focused on trust.

It’s also perhaps important to state upfront that building trust is hard and often requires a fundamental change in the way systems are designed, developed, and tested. Thus we should tackle the problem step by step and collect feedback.

Birds eye view on building TRUST

We will cover every step in the machine learning life cycle and what mechanisms we need to harness to improve trust ♻️:

  • Defining success — we define what it means “the model works well”. In this section, we discuss how we evaluate our models and whether they will be adopted.
  • Data preparation — garbage in, garbage out(💩+🧠= 💩). In this section, we focus on how we evaluate our data.
  • Model development — machine learning is experimental by nature, and we got a small room for errors. In this section, we describe what you need to measure and validate offline when developing your new models.
  • System integration — is about the machine learning pipeline and the artifacts it produces for production. This section focuses on making sure the code, tests, and artifacts pass certain quality criteria with a big emphasis on reproducibility and automation.
  • Deployment — this is where our models first encounter production traffic. This section focus on running experiments on portions of the traffic and making sure our models work in production as well as we expect in terms of business value.
  • Model monitoring — is about avoiding models degradation. This section is intended to identify potential problems and allow the developer to mitigate them before they negatively affect the stakeholders.
  • Understanding the models’ predictions — provides insights to both the data scientists and the stakeholders regarding the model predictions at different granularities.
  • Data collection — provides the ability to improve with time.

The following flow chart summarizes the defensive mechanisms in each step in the machine learning lifecycle.

The machine learning flow of TRUST

Pro Tip #1🏅remember it’s a journey! You should focus on what hurts you the most and aim for incremental improvements.
Pro Tip #2
🏅clear and cohesive communication is as significant as the technical “correctness” of the model.

The first thing we want to do is define the success criteria, so we can make sure we are optimizing the right things.

Defining success

Like every software project, you can’t tackle a problem effectively without defining the right KPIs 🎯. A KPI or metric does not suit every scenario, so you and the stakeholder must be aligned on the merits of each one before choosing it. There are a bunch of popular metrics one can choose from. Those are divided into two kinds of metrics, both model metrics, and business metrics. Choosing the right metrics is an art.

Deciding these metrics should be done before we do any work. These decisions on metrics and KPIs will have a HUGE impact on your entire model life cycle. From model development, their integration, and deployment, as well as monitoring those in production and data collection.

Warning #1 ⚠️don’t take this step lightly! Choose your model metrics and business metrics wisely based on your problem.
Warning #2 ⚠️ the “perfect” metric (ideal fit) may change over time.
Pro Tip #1🏅use simple, observable, and attributable metrics.
Pro Tip #2🏅collect the metrics early on. No one enjoys grepping strings in logs later!

The next step toward building trust is to take care of our data. As the famous quote says “garbage in, garbage out“ (💩+🧠= 💩)

Taking care of your data

Data is a key part of creating a successful model. You need to have a high degree of confidence in your data, otherwise, you have to start collecting it anew. Also, the data should be accessible otherwise you won’t be able to use it effectively. You can use tools such as sodagreat-expectations, and monte-carlo.

Gathering requirements and putting proper validations in place requires some experience 🧓. Data quality comes in two forms.
Intrinsic measures (independent of use-case) like accuracy, completeness, and consistency.
Extrinsic measures (dependent on use-case) like relevance, freshness, reliability, use-ability, and validity.

Warning #1 ⚠️ data quality never get enough attention and priority 🔍.
Pro Tip #1🏅basic data quality checks will take you far🚀. The most prominent issues are missing data, range violations, type mismatch, and freshness.
Pro Tip #2🏅You should balance validation from different parts of the pipeline. Checking the source can catch more sneaky issues, exactly when they happen. But its reach is limited as you can only examine that specific source; instead you can examine the outputs in the end of the pipeline to “cover more space”.
Pro Tip #3🏅after basic data quality checks look how top tech companies approach data quality.
Pro Tip #4 🏅after basic data quality checks you can take care of data management.

The next step toward building trust is the model development phase.

Results based advertising starts here.

Contact us
Please fill out the form below to submit your interest.
Join Our Newsletter
Please review our privacy practices: read privacy policy.