The need for AI safeguards with human-in-the-loop systems

AI systems have problems. They don’t always work right. They hallucinate, lie, or simply forget the all-important closing bracket in a JSON payload. AI systems need accountability, safeguards and oversight. But what should that look like?

Human-in-the-loop (HITL) and its companion human-on-the-loop are in a nascent stage. These strategies, designed to monitor and govern AI system execution, seem ill-fitted for their goal.

Let’s look at a simple example. A user asks a chatbot to craft an email to their boss and send it. The large language model (LLM) churns away, delivers perfectly pandering prose and deploys the message using a “send email” tool.

But wait.

Did the AI think it through? What if it didn’t send it to the right person? What if the LLM chose to insult the user’s boss like only a heartless robot can do? This hypothetical use case needed a human-in-the-loop. With human oversight, the chatbot could have asked the user to review the email to guarantee it was factual and accurate.

Let’s up the ante. What if it is larger than just one email? What if the user sends 500 emails a day? What if the user wants the LLM to craft not one email, but hundreds? And each must be tailored to a specific customer, with specific language and tone to best appeal to that customer’s interests? Suddenly, human-in-the-loop becomes a drain on resources.

This simple example reveals some of the fundamental problems with the concept of HITL. Put simply, humans aren’t good at this task because they:

Miss key indicators or issues, especially in large bodies of text like LLM-generated documents.
Get bored or fatigued reviewing the same workflow. If something is right 90 percent of the time, how likely is it that humans will catch the other 10 percent?
Need to be a real person and sleep and eat, unlike their robot counterpart.

All of these reasons put a natural limit on the capabilities of HITL systems. But beyond the reasons humans aren’t good at HITL, it isn’t fundamentally scalable either. If we want agentic AI systems to truly be effective, we need systems that can run hundreds or thousands of times per day, performing tasks that are too complex for the human overseers to reasonably understand. Beyond this, if an AI system is used in a low-latency context, humans simply will not be able to react quickly enough to provide meaningful oversight.

It’s clear. We need something better. We need HITL that will work for the agentic age, as masses of AI agents are deployed to act on behalf of humans.

What does that look like? Let’s examine some fundamental concepts that will be crucial in creating valuable, effective and robust HITL systems, without our natural human limitations.

Human-in-the-loop concepts

Lower environment baselines

Anomaly-based alerting will be a key indicator for AI systems in the future. It will be crucial to distinguish runs that are functioning within their probabilistic boundaries from systems that have gone off the rails. For this, we need effective lower environment baselines that can accurately simulate a production workload.

Elements like load testing, chaos engineering and messy data are all necessary components of an effective lower environment. Examining how AI systems react in these simulated, imperfect environments can give us a high degree of insight into the problems they will run into in production.

Once these problems are identified, we can insert the appropriate monitoring frameworks to detect and prevent these problems.

Deterministic anomaly detection thresholds provide us with a simple way to identify problems. Metrics like memory usage, common parameters used in specific execution contexts, the number of AI tool calls, or the number of reads to a database can all be great indicators when something isn’t working as normal.
AI detection systems can be trained on this lower environment data and supplemented with production data to improve performance. Letting AI do what it does best, finding hidden patterns or weird insights invisible to human eyes, can help us detect problems with other AI.
Lower environment functional testing gives us an honest assessment of how effective our AI system will be. What is better than human-in-the-loop? Human before the loop. If a developer can identify that their AI system will not sufficiently complete the task it is assigned to do, there is no need to risk it with production data. Functional testing in an accurate environment can give confidence that an AI system has the bare minimum capabilities for the task at hand.

Easy UI introspection and approvals

One of the chief issues with HITL systems is that humans grow tired of reviewing. Anything that a human needs to examine, review or approve needs to look good and be easy to use. But how that gets done is another question. Each human touchpoint should be carefully designed to maximize a human’s reviewing capabilities while limiting waste clicks or confusing JSON blobs. We must focus the UI on:

Highlighting anomalies, with easy ways to compare to known-good runs. A good HITL UI will put the human’s eyes directly on problem areas and give them easy ways to identify specific issues amongst complex environmental or AI execution data.
Enabling humans to dig deeper into the “why” on specific runs. This may involve incorporating explainable AI concepts into decisions the AI system has made, or it could involve more in-depth log search capabilities to give a user powerful access to additional data that can’t be displayed neatly or on the first screen.
One-time, or one-per-run approvals. There is little worse in the tech world than mindlessly clicking “accept” as your pipeline slowly ticks forward. AI executions must be better, only requiring one approval to complete an AI run at the most critical point, or finding ways to allow similar requests to continue under the permissions of previous approvals.

Picking the right stopping mechanism

When there is a problem with an AI system, or when one needs to strictly stop an AI execution for a critical review, knowing how to stop it is just as important as when or why. Different stopping mechanisms provide a reviewer with different power, and developers should carefully examine all of their options to best meet the need for oversight.

A “full” or “static” stop is likely to occur on every AI execution, involving the AI system stopping and waiting for a human to review pertinent information and approve. This is the most powerful HITL method, and likely the one you are most familiar with. However, it is the one we must use most judiciously, for all the reasons shared above. For highly critical workflows in high-risk environments, humans should stop every request. However, the lack of scalability and the error-prone human condition mean this HITL control will fall short for many use cases and shouldn’t be used for lesser approvals.
A “dynamic” stop looks similar to a full stop, where the AI system pauses everything it’s doing and waits for a human to review. Importantly, though, it only does this if the criteria are met. This means an AI system can operate in lower-risk settings or on lower-risk systems, and only alert if thresholds are met to warrant a human review. This eliminates the low-hanging fruit of human exhaustion, but still tightly couples humans to AI workflows.
A “parallel” stop asks for human involvement while continuing other processes and can happen on every run or only when certain conditions are met. This style of HITL control provides flexibility, allowing a human to go for lunch and click approve when they get back. However, the HITL is now further removed from the execution of a given run, so critical workloads could be missing necessary oversight.
A “random” stop only pauses randomly selected executions, with the human performing a quality control check on the AI system’s operation, rather than a traditional “in the loop” approval. This HITL control is now likely unfit for critical workloads, but it does provide valuable oversight into the ongoing operation of a large-scale or high-volume AI system.

None of these approaches is strictly better than another. Instead, developers should look to understand the best type for and ensure they layer on defensive strategies to prevent one HITL failure from cascading into other issues.

End customer feedback

When it’s all said and done, the customer (or business client, or whoever is benefiting from the AI system) needs to be satisfied that the system is doing its job well. If they aren’t satisfied, developers need to know about it so they can better design the system to meet the customer's needs. This extends to HITL, where dynamically evaluating a HITL control loop by comparing it to customer feedback can allow developers to make meaningful changes to better handle a poor AI run.

Consider this: A low-risk marketing agent sends emails and other material to targeted customers based on their individual interests. Due to the high quantity of emails sent, no human could reasonably be in the loop for every run. Therefore, a random stop HITL system is employed to check in on the occasional email. However, how will the developers know if they are checking often enough or too frequently? They can employ user feedback in the form of the click-through rate and the unsubscribe rate of the marketing agent’s messages.

If the click-through rate spikes, developers can perhaps dial back their oversight, confident that the marketing agent is performing strongly. Conversely, an uptick in customers unsubscribing from the campaign may indicate that the AI agent isn’t performing correctly, and overseers could be delivered more HITL cases to review.

Importantly, this system is totally automated, once again allowing humans to focus on more important tasks. They don’t have to monitor the HITL queue every day, and they don’t have to monitor the feedback loop for the HITL queue every day either. Rather, the AI system does what it is designed to do, and a human can step in when it really matters.

Final thoughts

Human-in-the-loop is likely to evolve dramatically in the next few years as AI systems skyrocket in usage. Developers can’t get left behind and can’t spend all day checking the box that their AI system has done what it was supposed to do. We need to design smarter systems and let AI and computers do what they are good at, so we can spend more time doing what we are good at: being human.

Keep reading

Are you curious about how a comprehensive data and AI platform can support HITL and be nimble to emerging technologies like agentic AI, quantum AI and synthetic data? Read our latest white paper, which explains the methodology needed for transparent and explainable AI outputs with humans in the loop.

The AI blueprint: A leader's guide for organizational trust and ROI during rapid change

Get the white paper

Blogs