Supervising your AI system can be beneficial at many levels. Here are three ways how it can help.
Since most of the AI solutions are appearing as black boxes, it is becoming more relevant to supervise them regularly. Not monitoring these systems could mean that things would go wrong without getting noticed. A significant problem if you see issues somewhat later, and would be a huge trouble if you never notice them until severe consequences occur.
Practically, we train machine learning systems on a fixed dataset. However, in reality, we almost always subject these systems to the data outside of their training distribution (long tail as some would call it). How can you be sure that AI systems work robustly on the real data?
Wait and watch cannot be an approach here. Some machine learning models you could train on the fly, mostly as unsupervised learning mechanisms. You can feed new training inputs back to the system and then the system keeps optimising continuously. However, while this might be good enough sometimes, at other times, failure can be disastrous.
Customers may, at times, tolerate a hundred useless product recommendations. However, they certainly cannot tolerate a single mistake that can be hazardous or cause them to lose a significant amount of money. Business users are no different here.
Here are three things you may find, only by supervising your AI system.
Highlight Statistical Bloopers
By and large, these analytical techniques depend on summary statistics of the dataset. This approach can be a significant problem because multiple datasets with different characteristics can have the same summary statistics. In Feb 1973, F. J. Anscombe published a journal article where he says, “A computer should make both, calculations and graphs. Both sorts of output should be studied; each will contribute to understanding”.
Inspired by Anscombe’s Quartet and the Datasaurus, in Aug 2016, Alberto Cairo created the Datasaurus[3] dataset. This dataset urges people to “never trust summary statistics alone; always visualise your data”, since, while the data exhibits normal-seeming statistics, plotting the data reveals a picture of a dinosaur.
Autodesk, in 2017 published a detailed paper titled “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing”, on these new characteristics of datasets.
As you go through all these references, you would realise that from an attacker’s perspective, this is a boon. Also, in real life, the occurrence of these types of scenarios is possible. Without having a capable red team and supervising AI regularly, these issues could turn into threats and may go undetected for too long.
Detect Concept Drift
In addition to the statistical bloopers, supervising AI can help you identify a phenomenon known as “Concept Drift”. A concept in “concept drift” refers to the unknown and hidden relationship between inputs and output variables. The change in data can occur due to various real-life scenarios.
More so, this problem of change in data over time and thereby affecting statically programmed (or assumed) underlying relationships is a common occurrence for several other real-life scenarios of machine learning. The technical term, in the field of machine learning, for this scenario is “Concept Drift”.
You may ask, “Why is this a problem?”.
For an utterly static use case, this (concept drift) is not a problem at all. However, in several use cases, the relationship between input parameters (or features) and output characteristics change over time. If your machine learning model did assume data patterns to be static, there would be a problem in the future. This drift cannot be predicted as the changes in the environment or affecting factors could be random or fabricated. Either way, without supervising AI system, you would never be able to uncover them.
One of the significant challenges in dealing with concept drift is to detect when this drift occurs. There I suggest one of two ways to handle that.
When you finalise a machine learning model for deployment, record its baseline performance parameters such as accuracy, skill level, and others. When you deploy the model, periodically monitor these parameters for change, i.e. supervise AI regularly. If you see the difference in parameters is significant, it could be indicative of potential concept drift, and you should take action to fix it.
The other way to handle it is, assume that drift will occur and therefore, periodically update the model.
It is unrealistic to expect that data distributions stay stable over a long period. The perfect world assumptions in machine learning do not work in most of the cases due to changes in the data over time. This limitation is a growing problem as the use of these tools and methods increases.
Acknowledging that AI or machine learning may remain inside a black box for a long time and still they would evolve continuously, is the key to fixing it.
Find What’s Inside the Black Box
AI systems being the black box is going to be a pertinent and ongoing problem. Regardless of the influx of sea of trust issues and questions, it will be near impossible to get access to the internal logic of AI systems.
However, this black box problem can be handled slightly differently with the help of red teams in operation and supervising AI.
This approach, however, needs a careful designing of experiments and scenarios by testing & red teams and then executing them systematically. When done correctly, users or organisations can reasonably estimate what is going on inside the black box and then act accordingly.
Remember that, this is true for external parties too, who would try to manipulate your AI systems in malicious ways. So this is why AI supervision can play an essential role in identifying these risks upfront or as early as possible.
Go Supervise Your AI
We have a supervised machine learning as a concept but supervising an AI itself is not deeply thought out.
You can train an AI system to fall back on human intervention can be one of the strategies. However, some may often question its effectiveness as it is counterintuitive to the existence of an AI in the first place.
Relying solely on fine-tuning of machine learning models to produce more and more accurate output is not enough. It will never give you a clear picture of actual risk exposure.
Several machine learning techniques that feed to the AI system rely on finding correlation within a supplied dataset. Whether it is training dataset or ongoing feedback dataset, mostly they would be treated in the same manner.
Supervising your AI system can be beneficial at many levels.