General

Human in the Loop: Episode 4 | The Future of Enterprise Agents

byon May 15, 2025

Welcome to Human in the Loop.

Scale is the humanity-first AI company. We believe the most powerful and responsible AI systems are built with humans in the loop. And now, it’s also the inspiration behind our new video podcast series Human in the Loop: your guide to understanding what it takes to build and deploy real-world enterprise AI systems.

We share our learnings from working with the largest enterprises, foundation model builders, and governments to equip you with the insights you need to build practical, powerful AI systems in your organization.

About the Episode

In this episode, Scale's Head of Product for Enterprise Solutions, Ben Scharfstein and Head of Enterprise Engineering, Felix Su dive into the future and what's needed in the next generation of agents to enable them to work more effectively in an enterprise context.

They cover:

  • Why many agent frameworks today are just wrappers and what next-gen agents must do differently

  • The shift from task automation to goal-driven, long-running agents

  • How proactive agents that will act like a chief of staff, managing workflows and surfacing context, upleveling and augmenting teams

Watch the full episode or read the transcript of their conversation below. 



Episode 4 - The Future of Enterprise Agents

Felix: Today we're going to be doing a fun topic and talking about the future of agents. What's next, what's on deck? And specifically, what's needed in the next generation of agents to really help them work, both theoretically and in an enterprise context. Ben, I'll hand it off to you: what is the state of agents today and what do you see as coming up?

Current State of AI Agents

Ben: This will be fun. I think we'll get to share the types of things that you and I talk about in the office all the time. Before we get totally into the future of agents, let's just do a quick recap of where we are, which is that agents work, and really cool agents work on Twitter or on X, but bringing those into the enterprise is delayed in practice.

I think the first agents that have really come in and made a difference are coding agents. We see tools like Cursor, GitHub Copilot (assuming "Windsurf" was a placeholder/misremembered), and Cloud Code come in, and those are tools that people are using every day to actually bring agents into their work.

But for most white-collar work or knowledge work, we don't yet see agents making a difference. And I think that's because we haven't fully integrated into the types of systems, data, and guardrails that we really need to make agents work in the enterprise.

But given what we see coming out of the foundation models and out of a lot of really cool companies who are experimenting on the edge, I'm very optimistic that the next 6, 9, 12 months are going to bring a lot of production agents into the enterprise and into the workforce.

Challenges and Future of AI Agents

Ben: I think you have a really strong point of view on how people are building them. Today, I've heard you say in the past, mostly frameworks are just wrappers. They're wrappers on top of, regardless of what framework you're using. They're all doing relatively similar things and wrapping LLM tools. Maybe they're wrapping an MLOps server (assuming "MCP server" meant MLOps or similar). But what's your opinion on that? Why does that matter? What are the new things that you'll see in the future that are going to be more than just wrappers?

Felix: For me, the next generation of things that we are working on at Scale—this is something that I'm actively doing, we're talking about this right now, and I've already developed a lot of this—is agents that can work on their own. They're asynchronous, they're bi-directionally communicable with humans. They're able to work even when you're not paying attention to them, but then when you open up your screen, you can see what they're doing.

As you see what they're doing, you're able to intervene and say, "No, that's bad. Do this." We talked about compounding errors in the last episode. I don't want to wait for it to fail and then come back and say, "Yeah, in step five out of a hundred, those last 95 steps were useless. Go back and do it." At step five, if I open up my laptop, I want to be able to say, "No, this is the wrong path." And so, the closest thing that I've actually seen so far... or let's talk about the closest things I've seen so far. Obviously, you have deep research and things like Meta's work (assuming "Mattis" referred to Meta or a similar research entity) where you can offload, and it does some stuff behind the scenes.

I think that needs to be a little bit more prevalent, as a more commoditized thing. A lot of these companies have built their own in-house ways to do it. But I think for us, it's important for us to make it more of a commodity for enterprises to be able to leverage. These are not simple systems to build. It takes a lot of engineering work to think through them. And we want to be able to bring that power in so that people aren't just building ChatGPT clones in-house. They're building these really complex agents that these other companies have privately, but it's not really commoditized.

Product Perspectives on AI Agents

Felix: And I'm curious from your point of view, as an engineer, I've engineered it, but I'm curious how you as a product person would deploy this system. What things do you think are now possible if you have this tool at your disposal?

Ben: I talk about this a lot. I say we're at the point now where something that you can train someone to do in a day—you can write a rule book and give them instructions and have them do it—LLMs can already do, and agents can already do. And we're pushing out further and further how long it takes to learn how to do something.

And I think we've talked about on previous episodes that decision-making is not going to be, in my opinion, put on agents because there are a lot of preferences. Humans are not just right or wrong; they're making trade-offs, and those trade-offs reflect their preferences. So those things won't go away. But the ability to execute on tasks I think is really important, and that's something that we will see agents doing very soon. One of the cool and exciting things for me is that you can expand the capabilities of agents by really just thinking about how people do their job.

And I think the exciting thing is that we're giving agents more and more of these tools: we're giving them access to computer use and browser use, we're giving them access to coding environments that are secure and they can't mess things up, and we're giving them access to the web and to search and all these types of things. And we're allowing them to think and to not just have immediate responses.

Today, you tell an agent, it analyzes data and says, "I gotta answer this right now." It's like someone who is so overeager to try to jump to an answer that they just skim the data and then give you an answer. But as we're seeing these long-running agents, they're going to say, "Actually, I'm going to take a couple of days to really think about this." And I'm going to use all these tools at my disposal. I think one of the very cool analogies is to the early days of the internet. So if you really think about version one of Web 1.0, which is you're presenting text, you're just reading text, and then at a certain point, you could read and write. And that was a really big deal when you went to Web 2.0 where you said, "I'm not just consuming information." Which is like what early LLMs did; they're just consuming or giving you back information. And then the second version, I think where we are today, is actually they can not just read information or type back but actually take action. And so, similar to the web, you could start taking action and interact, and then the next layer that really made Web 2.0 interesting was jQuery and asynchronous stuff going on.

And that's where we're at this precipice right now. We're getting to these asynchronous agents that are not just doing things live but they're doing things in a more long-running manner.

Felix: Yeah, and something to add to that is that I also think there's an opportunity for the agents to be proactive as well. We talked about reactivity, triggering something, and asynchronous operations. The other day, with this framework, I'm trying to test this capability where we're booked back-to-back a lot of the time. And so, a lot of times with Slack messages, you're just trying to rush and figure out how to respond to all of these things and listen to all my meetings. And maybe 70% of these are something an agent can look up. Hey, you have access to all the channels that I'm a part of. When I get a DM, why don't you just take a first pass, go through these things, formulate a response, and then double-check with me before I send it? And then send it. So there's asynchronous workflow, there's reasoning capabilities, there's search, there's tools, and then there's proactivity.

It's event-driven. It's based on triggers that happen in the real world that trigger these things without a human needing to solicit, "I need help on something." It's just implicitly tasked to do that thing already. So, I think there's a whole world of opportunity for those types of things.

Now, the reason why we haven't seen a lot of these things in practice is when you're asking agents to take over some of your work. As you said, there are a lot of preferences, a lot of nuances, and it takes time to tune these. And I feel the most common examples you see are on the internet, where people are building simple things to solve their day-to-day tasks. So you're not really going to trust something to be as proactive right now, but the goal is—this is why we're working through the hard problems right now with enterprises—that once we learn those things that you need to control, we can bring those into these systems.

It's not that AI is a threat to humans; it's just forcing humans to be more human. Because this person doesn't have to do this thing, they can do something more human. Maybe think through it. And we talked about it in a previous episode about Cursor and how it can't really replace an engineer fundamentally because there are so many preferences, biases, and learned experiences that a human will be more nuanced about than an AI can. It forces you to be that better engineer and not the one who's just going to turn out code and write this function. An AI can do that. And I think the fun part about generative AI, it's exploded onto the scene, but really what it's done is just raise the floor. It just kind of raised the floor of what we should be doing, what we should be spending our time on. And our goal with these agents is to raise the floor one notch again. Instead of just saying, "Hey, we have ChatGPT here," let's raise the floor one more step and let these agents be autonomous and proactive and really be partners with us. And that to me is not a threat. I see it as very exciting as somebody whose job could potentially be taken. I'm thinking, "Please. I have other things to do. I have more interesting things to work on." I don't need to be doing these day-to-day tasks. And I'm curious what your take is on that. How many things do you do per day where you think, "I just don't want to do this; I could spend my time on other stuff, this is whatever"?

Ben: Totally. Every day I'm using AI more and more to allow myself to focus on thinking, not as much on executing or documenting thoughts. What I mean by that is, there's a bunch of things where I already know what I need to write or need to do, or need to say, and then the act of doing it is tedious in comparison to really thinking about and making the decision. And AI already, I would say level one and level two agents, are really able to augment my execution on decisions in such a powerful way. And this is before we even reach, or even get close to, the full potential of what agents can do.

AI Agents in the Workforce

Ben: I think one of the very interesting things that you're talking about—you're talking about this Amazon story of flipping packages—and what you're describing, let's say a robot did it or some other task where, when you're going through your Slack messages, an agent did it. That's really a level four agent. You still are defining what the task is, and then you're saying, "Okay, this is something that I want to be automated," and then I'm going to build an agent that automates it. To really cross the chasm into level five, it's something much more profound, which is you don't even define the task, or you define the goal long-term. Say, "This is my job description." And then an agent is identifying the areas that it can make the human more efficient, writing its own code, or writing code for that agent and integrating into those software and systems that then allows the task to be automated without you even describing that it needed to be done.

So, one way of thinking about it, specifically for your Slack example, would be: what if you had a chief of staff that sat with you through every single thing that you did and just pulled things off your plate and said, "I can come up with a system for how to automate that. I can take that off your plate." And it was proactive, not just in solving your problem that you defined, but in even identifying the problems that it needs to solve. And I think this is really the goal of how do we get to the agentic workforce, the agentic enterprise: that we have agents that are acting as amazing colleagues and coworkers. And they're augmenting our work, and we work as teams. People very rarely work completely independently; they work as teams, and we're going to have augmentation from agents.

One of the things that I'm excited for is being able to send my digital clone to a meeting on my behalf. And it has total context on my decisions, how I talk, how I think, comes up on the Zoom with an AI avatar of me, and maybe it Slacks me in the middle and says, "Hey, there was this discussion about this thing. What do you want to do about it?" And I say, "Oh, okay, I can hop in, or I can come join the meeting. Or I can just reply in Slack, and then it can go deal with the rest." But oftentimes there's so much of knowledge work that is just conveying information that's in your head or that you've documented somewhere else, and conveying that to other people and getting people on the same page, and then executing on it even after decisions have been made. And I think what you said is really true, which is that we've got more important stuff to do. And that we're raising the floor and we're raising the ceiling of what we're going to be able to do through AI. And AI is really going to pull up the efficiency, the capability, the effectiveness, but also how rewarding it is to work in a company where you can just focus on the interesting parts of your job or you can focus on making hard decisions. Hard is not always good, but focus on making the fun and hard decisions that will really set apart your company.

Felix: Yeah, and it's like you said something very crucial there. I see AI as a workforce. I feel there are a bunch of executors. We should be decision-makers, and they're executors. A good way to think about it is, what if everyone can just become a CEO and the AI agent is their chief of staff? We are able to make those things that are fundamentally human, like strategic decisions: what do we want to do next? And the AI can learn from them and become smarter, and they can become more capable. But the one who takes the first stab at the strategy or the next step is always going to fundamentally be the human because the AI is basically supposed to be just modeling your behavior. And so the concept of a chief of staff is more accessible than it ever was before. And as long as we can give it the capability to maintain state and properly store memory, and properly replay past decisions and do all this sort of stuff, and then we put it on top of a framework that is super capable of these asynchronous long-running workflows that can work across days, weeks, months, years... The sky's the limit, and it's all about product design, controls, and guardrails, which is all stuff that we are familiar with, even in the level one, level two, level three type of work streams that we have. So, all this agentic enterprise stuff that we're pitching and moving towards, it's all part of the plan. And it's all just about good design when it comes down to it. And so, I think it's definitely an exciting part of our future.

Conclusion and Next Steps

Ben: And I think what's really critical there is how do we go from the demo on social to deployed in the enterprise. And it really all comes down to: the right guardrails, the right human-computer interaction and collaboration, making sure that the full end-to-end experience has precision, even if the agentic part doesn't necessarily always succeed. Then making sure that we're really good at routing to humans to get that preference information, to pull in all the context that they need, and to make sure that we go in the right direction with these agents. And I think that's what we work on at Scale: making sure that we're not just building good demos, but we're building good product and we're really solving really important needs at the highest quality, with agents in the loop and humans in the loop as well.

Felix: Awesome. So that concludes our discussion about where agents are headed next. Next week we're going to be talking about AI governance and red teaming for enterprises. Don't forget to subscribe so you don't miss it, and leave comments below if you have any questions.


The future of your industry starts here.