In a previous article, I wrote about the importance (and challenges) of assessing an ML model’s UX/operational/commercial efficacy (i.e., impact when used under ideal/controlled circumstances) and effectiveness (i.e., impact in real-world conditions) as part of an ML product. In the same article, I discussed the similarities of this problem to the ones medical researchers deal with, when assessing the efficacy and effectiveness of candidate interventions (such as vaccines), and how ML products can draw parallels and transfer some such terms and learnings from the medical domain to theirs. This article will expand on that story.
First, I would like to mention two points on why a well-designed experiment matters for proper evaluation of ML products’ efficacy/effectiveness (and hence why drawing such parallels is important):
1.1. Computational irreducibility
Many real-world systems are complex (and to the best of our current understanding, “computationally irreducible”). Even though Science has historically done well by finding small computationally-reducible subsystems within large complex systems (which enabled us to model them, predict their behaviour, and jump ahead in time), our ability to make such predictions about systems is not universal (see this video by Stephen Wolfram, for more). That is, in many complex systems, in order to know what will happen, we will need to go through each step of the process and actually observe what happens (rather then using a set of equations to accurately predict them); in line with the “measure, model, perturb, repeat” narrative in biology.
Both medicine and certain applications of ML products fall into categories that, to the best of our understanding to date, remain computationally irreducible. Therefore, we need to follow them and see if our interventions lead to the desired effects or not. This can be the reason, for instance, that we see surprising results such as Placebo effects in many clinical trials; such unpredictable human perception and behaviour can question the validity of some qualitative data modalities for the evaluation of ML products’ efficacy/effectiveness in high-stake decision making. For instance, you can see how in Table 17 from this report, a big portion of placebo arm of Pfizer’s COVID vaccine trial felt symptoms such as Fatigue, Headache, and more; maybe they felt the vaccine, which they didn’t actually receive, is working? Or maybe they actually had those feeling? …
1.2. Need for new frameworks and linguistic abstractions in ML products
In addition to the need for proper experiment design for the evaluation of an intervention in a complex system, the need for effective communication and collaborative frameworks in ML product teams is another reason why such parallels matter. When you enter a new domain, it is common to encounter a new terminology that most people in that field have the same (or similar) understanding of. For instance, almost everyone in medicine knows the difference between efficacy and effectiveness, which alleviates the need to describe them in length when communicating.
Whilst specialties within ML products (ML, design, dev, …) have their own terminologies, when it comes to new tasks at the intersection of multiple specialties (particularly, when one of these specialties is ML) product teams can benefit from some new terms. Such common languages help with effective communications and collaborative progress towards common goals. For instance, from neuroscience research, we see how speaker-listener neural coupling can form as a result of effective communication; this in turn can lead to better alignment and common understanding.
Next, there are two additional vaccine concepts that I did not mention in the first article that are similar to what ML products deal with — namely, vaccine (or AI) hesitancy, and antigenic (or concept) shift and drift — which are worth elaborating on:
Vaccine hesitancy, which is defined by WHO as a “delay in acceptance or refusal of vaccines despite availability of vaccination services”, has been reported in more than 90% of countries in the world (hence listed as one of the WHO’s 10 threats to global health). There is ongoing research — both in academia and by the governments around the world — in identifying the patterns of such behaviours and potential interventions to change them. One of the common models in the field explains such tendencies based on factors such as complacency, convenience and confidence (i.e., the “Three Cs” model) — you can read more here, for instance.
Can we define “AI hesitancy” as delay in acceptance or refusal of using AI, despite indications (e.g., through research, or POC) of AI’s ability to solve a particular problem? If so, how can we systematically study such tendencies in some verticals? What are the factors that can explain such tendencies? Are there any common factors (and potential remedies) between the two worlds? We see growing interest here (e.g., studies at the interface of AI and society), whose further growth will benefit both AI community and ML products.
2.2. Shifts and drifts
One of the challenges of dealing with viral infections is that viruses are continually evolving to evade the host immune response; this results in two major challenges: antigenic shifts and drifts (for a brief intro, you can watch Khan Academy video; also see this website and others that track such shifts and drifts for viruses). This is very similar to shifts and drifts that ML products are likely to face — both in terms of their product market fit (PMF) and their underlying models’ goodness of learning and generalisation (or, prediction accuracy).
Many consider PMF as a dynamic concept (as opposed to a one-off discovery), and hence map this concept into some metrics (e.g., engagement, simplicity of UX, revenue) and track them under their “product analytics” agenda. On the ML side, there are many flavours of this experience, that may need model (re)calibration and (re)training, or (re)adjustment of some of the model’s parameters. Therefore, it is important for ML product teams to draw parallels from good product teams (as well as medicine) in this regard and create a setting that enables them to anticipate and detect drifts and shifts in their ML models (and PMF) and empower themselves with a range of remedies to deal with them.
This is the 6th article in a series of posts that I wrote about the development of AI-first products (and AI-first digital transformation) — challenges, opportunities, and more. I would love to hear your thoughts, and any learning and experiences that you and your company might have had in this space. Get in touch!
Of course, this is a personal opinion and does not necessarily reflect the viewpoints of AIG or the University of Oxford.