General

We try to predict the next 6 months in AI | Human in the Loop: Episode 11

byonJuly 31, 2025

In today’s episode Scale’s enterprise team (Ben Scharfstein, Sam Denton, Felix Su) try to predict the next six months in enterprise AI based off the trends they’re seeing now. They cover everything from the next frontier in agents to the impact on jobs. Tune in in six months to see how accurate they were.

About Human in the Loop

We share our learnings from working with the largest enterprises, foundation model builders, and governments to equip you with the insights you need to build practical, powerful AI systems in your enterprise.



Watch the video or read the transcript below to get their insights from working with leading enterprises and frontier model labs. 

Episode 11 - Making AI Predictions

Monica: Today you guys have prepared some predictions for us for the next six months. Let's go ahead and kick it off to you, Sam. What do you think is happening in the next six months?

Sam: I have three takes overall. The first is that right now, when enterprises think about evaluation, 90% of the time they're looking at evaluating text or rankings of text, and 10% of the time they might be evaluating actions. I think six months from now, that's going to completely flip, and they're going to be evaluating 90% actions and only 10% text. And what that means is that when they evaluate these actions, the actions will actually be taken. You can think about an agent saying, "Hey, I'm choosing between these two options. What do you think I should do?" The enterprise SME will then choose an action, and the action will be taken as a result of that evaluation. The evaluation landscape will really switch from text to actual action.

Ben: What do you think that means in terms of metrics? Are the evaluations going to be more about live production metrics, like, "Did it do the thing that the end user wanted?" Is it just a BI dashboard you're looking at, similar to how Amplitude or any of those tools look at your metrics? Is that what the evaluation looks like? Or are we still doing these offline evaluations where you have an LM as a judge or a human annotator going in and evaluating?

Sam: I think we'll see both, honestly. Sometimes it's going to be these live metrics you sort of react to, and in other moments, we'll have these really long-running processes where it's okay for the action to be taken 12 hours later when a human SME gets there and is ready to choose between two options.

Ben: Yeah, I think there's a lot of complexity as we go from a deterministic workflow of A, B, and C in a row to these agentic situations where every case is unique and the result of action A then influences what happens next. It's going to be complicated to do these evaluations. I think the production metrics are going to be very important, and setting those up for success—how you actually instrument these things, how you define success, what your reward function or metric is that you're trying to optimize for—is going to be the core of actually building highly performant AI in enterprise.

Sam: Yeah, and to Felix's team, it's going to be really hard to set up these environments and infrastructure to allow an evaluation to happen 12 hours after the action was suggested and then actually take the action 12 hours later. So, looking forward to seeing how you guys figure out how to do that.

Felix: Yeah, it's on me, I guess. I'm curious what you're saying. You do a rollout, at some point you pause because it needs to be approved, and then you need to click on it and do the evaluation. I'm wondering what you're thinking about in terms of how the evaluation tooling will look. Is it like you're live-dumping all those rollouts into an evaluation table, and then you need to do the evaluation after everything is done? You might have partially filled-out rows and things like that. What do you think the experience would look like?

Sam: Yeah, I can imagine it as, rather than "Option A versus Option B" as in, "Do you prefer this LLM's answer or this LLM's answer?", it's "Tool Call A versus Tool Call B." Do you want the LLM to take this action or that action? So the complexity for the infrastructure for evaluation is actually having those tool calls available live in the evaluation framework, so that you can actually see what happens when you choose Tool Call B and then adjust the trajectory accordingly.

Felix: But do you expect that to be different at the time of inference? I guess I'm seeing two paths here. One is live: you've paused the entire rollout and you're just waiting for it, and that's being logged for an evaluation. Then somebody actions it, which then goes back and triggers the live interaction to continue.

Sam: Yes, exactly.

Felix: That sounds right. So you see two, you get like a fork in the road. You have Option A, Option B. You click one, it goes back and does something. What if I, as an evaluator, want to see what would have happened if I chose Option B, and then let that rollout happen? I mean, the possibilities are kind of endless with the number of forks that you have to handle. Or are you just going to say that's up to me? Okay. All right. Yes, there will be a lot of opportunity to explore this space where a ton of options will fork out. We talk a lot about different preferences on what to choose and what path it takes. I'm sure this one will be a challenging one for you.

Sam: I think it's just going to be a much more impactful evaluation experience, but also a lot more complex from the backend. You're no longer looking at a static screen with two strings. Okay, my second take for enterprises is that work-life balance is going to get better. I think when COVID happened, everyone kind of brought their computers home and started working nonstop. I think now that we have agents where you can kick off jobs overnight and actually trust that the agents will do these jobs, hopefully people can go back to working nine-to-five sometimes and not have to think about their computer being at home and thinking, "Oh, it's midnight, I really need to get that email out." So I'm hoping that agents can kind of help with work-life balance for people after the effects of COVID.

Felix: I am torn on this one. I think my wife would tell you that because it's easier for me to prompt a model to do the coding for me, I will take my computer with me in situations where normally I would not. Normally, if I need to code and I need to write 300 lines of code, I'm going to say, "All right, we're going to dinner. I'm not going to finish in 30 minutes, so I'm not going to bring my laptop." But now I'm like, "I can finish in 30 minutes."

Sam: So you're bringing Cursor to dinner now?

Felix: Well, I'll be like, "Babe, you wanna drive?" So maybe work-life balance for me is not going to get better. Maybe not the same for me as for everyone else. But yeah, I'm a little torn on this one. There is definitely a lot of opportunity for agents to take over, but I do abuse that free time for sure.

Sam: And then I think my last prediction for the enterprise AI market is that the cutting-edge enterprises will start to create these sandbox environments for them to collect RL data for their agents. I'm really pushing enterprises to do this because I think it's going to unlock a lot for continuous learning and RL and things like that. But I think the cutting-edge enterprises will have these sandbox environments in the next six months where they can collect data on the types of actions or traces that they want these agents to actually know how to do.

Ben: Yeah, I think if you look at what the leading model companies are doing, what they're doing today, what they were doing a few months ago, and you just pull that forward, that's a pretty clear way of seeing what enterprises will be doing. And obviously, one of our core value props is how do we bring enterprises closer to what the leading companies are doing. And they're definitely doing that. We see these RL environments, whether they're coding agents or other types of agents, and that is the state of the art. I agree with you that the delta between what the labs are doing and what enterprise is doing is compressing. So, yeah, I think in six months, if they're working with the right partners, they'll be doing those things. This makes a lot of sense as we go towards continuous learning and as we go towards customizing these agents for enterprise environments.

Felix: I think there's going to be steps to it, though. When you say "sandbox," I think everyone immediately jumps towards the browser-use, VM kind of thought. But we kind of touched upon this last time: there's a step right before that, which is just about consolidating the interfaces to your data. I remember my sister used to work at Bonobos, a company that's a subsidiary of Walmart now. And they had disparate data all over, and it was just really hard for you... if you looked at it as a human, you were like, "If I had access to all of this, technically speaking, I could come up with really good insights." But because it was so disparate, it was really hard to bring them all together. I feel like there's just a lot more pressure now for an enterprise to say, "Look, if we're going to want a clean AI interface to interact with all this stuff, we're going to need to build the abstractions and then pull it together." I think there's a lot of pressure on CTOs, a lot of pressure on the technical officers in enterprises to basically build this interface. I think the next level above that is, if you want an AI to operate in a VM, which is more natural for a human, to actually exercise those things—like clicking on the Excel icon in Windows is completely different than using the Excel API. There's definitely going to be a gradient. I'd say six months is a bit compressed for most of these companies, but it's wishful thinking. But I think it's a step in the right direction.

Sam: Yeah, I also mean not just these browser-use things, but also taking your production database and being able to make edits to it without actually editing your production database. So, taking these snapshots of where you're currently at in terms of your enterprise data, being able to mess around with it, see where you come out on the other side, and then being able to learn from that.

Felix: Oh, that is actually—I wasn't even thinking that. Okay, I get what you mean now. For the people who are as confused as me, what you're saying is being able to have clones.

Sam: Yeah, exactly. Maybe "sandbox" was too loaded of a term.

Felix: Right. If you're going to want an AI to mess around, you're not going to want it to mess around in your production system. So you have to make these clones of everything. I get it now, and I one hundred percent agree with that.

Monica: Cool. Let's move on to the next one. Felix, what do you think is going to happen? What are your predictions for the next six months?

Felix: Okay, I have a few. We'll start off softer here. First, I think—and this is a little bit of a hot take—there are going to be some people who give up a little bit on AI. There will be a lot of people who are doubling down, but the major companies, the big tech companies, know that you need to invest. They're willing to take a few shots on goal, miss a few times, and they know this is just part of the game. I know that a few startups will probably do that. There will be a lot of startups who are going to go up and down because they're also willing to take a few shots on goal, but there's going to be a middle layer of people who aren't able or willing to take that many shots, maybe partially because they don't fully understand that it was just the wrong shot as opposed to AI not being effective. You can imagine some enterprises will say, "Hey, let's just build a chat interface on something internal," and then people won't use it, and they'll think, "Oh, maybe nobody needs it internally," or, "Maybe it didn't return revenue for us, so we're just going to give up on this." I think people aren't really talking about that sensitivity in that middle layer, so I think it has to be said.

Ben: Thank you for the layup. I think you need to build the right product and pick the right use cases. And I think in 18 months or in three years, for those enterprises that have given up now, I think it's not a stop, it's a pause. As new interfaces and paradigms emerge, as they get their data in the right place, and as they fix some of the problems that maybe made it hard for them to adopt AI, they'll come back. I do agree with you, Felix, that there are a lot of people where the investment may not be worth it right now, given the problems they're trying to solve or the way they're trying to solve those problems. Everything looks like a nail with some hammers today, and those hammers—ChatGPT, Copilot, Gemini—are super valuable across a wide variety of use cases, but they don't solve everything. Sometimes they try to solve things that are better solved with nuanced and thoughtful product experiences. We'll see what the shape of those companies that give up is and how soon they come back, because I think all of them will at some point. Software as we know it is fundamentally changing, but it has to be applied in the right way. Trying to use an LLM to do something that is better done with a heuristic or by a human doesn't make sense. We're going to see a lot of those use cases falling off in the next six months as the ROI needs to be there, and in some cases, it's not.

Felix: And making something new takes extraordinary vision. Expecting that in your company, somebody is going to have all the stars align and they're going to land the most amazing thing is a pretty tough ask for most companies. An analogy I like to make is that people thought the four-minute mile was unbreakable. As soon as somebody broke it, the next year, a couple of other people broke it, because suddenly your imagination changes and you realize, "Oh, that is possible." I feel like ChatGPT did that for a lot of people. You woke up, you had it, and you were like, "Wait, what?" And now everyone is latching onto that idea. I feel like some people are going to need the three-minute mile to be broken because they need that jolt of "Oh, we can apply this." Chat might not be the perfect use case for some of these people who are going to give up. Maybe they're going to need this next take I have about asynchronous agents. That might be the way to get there. And if they're not seeing it yet, maybe they just need to wait and pause.

Sam: When we saw this four-minute-mile moment with foundation model providers like ChatGPT, we saw AI go from being very open to being a lot more closed. When there's an enterprise equivalent of breaking that four-minute mile, do you think they'll stop talking about the AI they're doing internally? Will it be much more private, or do you think because of what they have to share for the public markets, they'll continue to share?

Ben: I think they're going to talk about it. I don't think there's any doubt that they're going to talk about it.

Felix: If you accomplish something great, it has an impact, it makes a difference on revenue... even just personally, you're going to want to shout your own praises and say, "I made a big difference." I don't doubt that there is more benefit to sharing than there is to being private. Privacy is maybe for labs and people who are building IP and technology that they don't want competitors to have, but for enterprises, I don't think there's any reason why they wouldn't share what they've done.

Ben: Well, it would be just to keep that competitive advantage against their competitors. I think the fact that they're doing it, they're going to talk about. How they do it, they're not going to share. People are going to keep that more private.

Felix: Okay. My next one is that there is going to be a pretty big shift to more autonomous agents. What I was just saying before about the four-minute mile—to me, the three-minute mile is that I don't have to sit next to the AI. An analogy I give sometimes is that the restrictive thing about AI is that I have to sit next to it. ChatGPT, Cursor, everything I have right now—while I'm filming this podcast, there's no work being done for me on my behalf, which fundamentally seems like it should be possible already. The fact that it's not is why we're here, investing in agents, tool use, continuous learning, and reinforcement, because those things are hard and not easy to apply. I feel like enterprises, maybe because of a forcing function from us because we believe in this, are going to move in this direction to say, "Okay, maybe you should just switch the order of operations." Instead of a human sitting next to the AI, the AI is talking to me. Imagine I had an earphone right now and I'm getting messages saying, "Hey, I did this. I finished all these chores for you. They're on your desk. When you go back to your desk, just review them and I can adjust anything you want." That is the next three-minute mile to me. That would change my life significantly. If you're talking about work-life balance, that is a massive difference.

Ben: Itotally agree. We talk about this a lot. Long-running, asynchronous agents are really the future. Today we have request and response, and in some ways, it's skeuomorphic to how we use software today. When the iPhone first came out, you had all of these apps that were trying to replicate webpages or what you do on a desktop. They didn't understand the value of being on mobile. I think we're going to have this same transition over the next six, twelve, or eighteen months with agents. Agents are not just software that you request from and get an immediate response. The value of agents is twofold. One is that they can do long-running, asynchronous tasks and take actions. The second is that they can scale horizontally. Today you have one instance of ChatGPT that you request to do something. It goes out, does some deep research, and ten minutes later it responds. What we're going to see is you're going to make a request, and it's going to spawn 10,000 agents that are each going to go do some work, combine that work, and get a response to you in the long run. I think that is the non-skeuomorphic version. That's not the faster horse; that's the car. It's really leaning into what makes these agents unique from the type of software we have today. I've talked about this before, but this is very similar to how Snowflake and data warehouses really separated compute from storage in the cloud and said, "What does cloud really bring to the table?" It's the ability to scale up our compute horizontally and then scale it down. I think agents are going to do the same thing. You can use 10,000 agents for one minute as opposed to doing that information serially for 10,000 minutes. I totally agree with you. The way that we interact with agents is totally going to change. Agents are going to be working for us as opposed to us sitting next to the agents.

Felix: And there's a really interesting thing we should briefly discuss. You were talking about cost going down. I think this is a part of that. It seems kind of counterintuitive—if I scale it up and ask the model a million times, why does cost go down? I think it has to do with GPU utilization. How do economies of scale reduce cost? If they have high utilization, all the machines that are sitting around doing nothing are suddenly being utilized, which can bring the cost down. I'm curious what you think about this as an ML engineer. You said there are a lot of machines sitting around doing nothing. How can these asynchronous AI agents utilize all this compute?

Sam: I think one of the things that we've seen over and over again—I guess there's kind of two sides of the same coin. One is that the more inference time compute you give LLMs, the more they're able to accomplish. You can imagine saying, "This is a task I want you to do. Based on how much utilization we have in our GPU cluster, use either a lot of compute or a little bit of compute. I'm open to whatever is available." The other thing is bringing more unification between training and inference. You can do inference when you have a lot of demand, and then when that demand dies down, take all your traces from inference and start training. As inference starts scaling back up, you ramp down your training, save a checkpoint, and go back to inference. You can really imagine this unified, maximal usage of GPUs.

Ben: As you move towards asynchronous, long-running work, you don't necessarily need to schedule all of that work during peak load. People are doing a lot of requests between nine and five. You wouldn't be doing a lot of this asynchronous work then. Overnight, or in times when there's low utilization, you can do that. You just might need the result before tomorrow.

Felix: Do we have spot pricing for models yet? You know, spot pricing.

Ben: There are companies working on this. This is the future. I don't know what the impact will be on enterprises, but it will be huge.

Felix: Like AWS, they did the whole spot pricing thing where they reduced cost during non-peak load hours. I'm sure every cloud company does that. That'll be huge. Okay, and my last one before I hand it off to Ben is a bit more of a discussion point. How do you guys feel about the potential for more layoffs to happen? I think there are going to be large companies, especially those who have traditionally hired by training up a bunch of people, who may not feel they are able to do that as much anymore with the advent of AI. I think for large tech companies, that is definitely a danger. That's a trend I've been seeing from people who've come from large companies before. That's a fear that we never used to have that we have now. For services companies as well, there is a fear that they're not going to be able to compete with the margins of people that are AI-enabled. So I think there's going to be a shuffling. I don't think it's going to cut 50% of jobs or anything like that. I do think there's a lot of opportunity for people to use AI and be with it. Your baseline should be as good as AI. There are ways to level yourself up, so I encourage everyone to do that. But I do see a short-term period where there will be uncertainty in the job market for sure.

Ben: Yeah, I think there's going to be a reallocation. We see consulting companies laying off and not hiring as many junior people. We see that is true in software as well. I don't think the net result is going to be an increase in unemployment. We're still going to see those people getting reallocated towards other companies that are really leaning in. The efficiency gains that people will get through adopting AI will change the margin profile of their business, and they can hire more people. There are these countervailing effects pulling against each other. We'll see. But I agree with you that companies are laying people off because they realize a function can be done by fewer people. On the flip side, they're also realizing maybe this function is really valuable if you leverage AI, and they want to hire more people. So we'll see where that lands.

Sam: Yeah, it's just a shift. Do you think we'll see an uptick in more creative jobs, like people doing art or music, as a result of this? Or do you think we'll just reallocate the workforce to more startups and smaller, high-velocity companies? Or both?

Ben: Both. I'm not an expert, but I think art's great. People should pick it up.

Felix: In every single industry, there are going to be things that are more accessible to somebody. For me, as someone who is not an artist or a musician, I have an idea of what I want something to look like, but I can't put it to practice. I don't have the skill, but I know what I want. For example, PMs now use tools like V0 at their disposal. You might not have the engineering degree or training to put it together.

Sam: Not everyone has a minor in computer science.

Felix: Some people can, but I'm just saying because you have these tools, you might be able to turn thought into practice, which is how a lot of art is made. So for people who were on that borderline, who were like, "Yeah, I can think about things, and I just want it to become something," I think a hundred percent those are jobs that are invented.

Monica: Great. Now let's move on to Ben. What's going to happen in the next six months?

Ben: The most obvious one is that model prices will continue to come down. We talked about this with better GPU utilization. I think there's a lot of model distillation happening at the labs. Sam Altman has talked about this, and it's been proven to be true. The impact of this is that a lot more problems become tractable. If you have to spend a dollar to accomplish a task, you're not going to transform some data for a user to maybe try to upsell them on something. But if that call costs a fraction of a hundredth of a cent, then that becomes a profitable thing for you to do. The implication in terms of what enterprises should be investing in is just to focus on providing the value, don't focus on the cost. In the past, when you had to think about compute costs, your AWS bill was not going to decrease by 100x in three months. But that really could happen now. We see this with our customers. We see that the price for the leading model from OpenAI comes down, and the second most powerful model gets as good as the last generation's most powerful model and its price comes down 10x. So you see a 20x, 30x, or 50x reduction in cost for the same work being done. It's really important just to invest in those workflows, to really focus on the value that AI can bring, and the cost is just going to go down to zero.

Sam: One interesting byproduct of that is the infrastructure that you're asking the LLMs to work with doesn't decrease in cost. We recently were doing some exploration around a Google project with search, and we found that our limiter on cost was not actually the GPUs, but the cost of search APIs. I think there are going to be other effects like that as the cost of LLMs comes down; the bottleneck on cost will come from other sources.

Felix: A hundred percent. You have to assume that cost goes to zero. Just look at the incentives, look at the demand—all the things align towards costs going down. You're going to release new models, and the previous models are going to become cheaper. You're going to see competitors release other models, and they're going to have new techniques that might make their model cheaper, like DeepSeek did. So, a lot of model providers are forced to reduce cost. Think about GPU utilization going up, which brings costs down. All signs point towards cost going down. We even had a customer where we focused a lot on fine-tuning, and then model costs were brought down to a level that made sense for them. As you said, Ben, you should always design with the thought that the cost will go down, and you don't have to overthink it. It aligns really well with something we talked about in a previous podcast, which is you focus on the product UX. You make your thing useful in spite of whatever model it is, and that gives you the flexibility to use that sliding scale to your advantage. If you feel like costs are prohibitively high, you can reduce to another model and make your way through it. As soon as the model gets to a cost that you like, turn it up. This is all part of the product design motion.

Ben: My second prediction for the next six months in enterprises is we're going to see agents really hit production. I don't think this is a particularly hot take; agents are all the rage. There has been a question of what the timeline looks like, and I think things like MCP and other releases—OpenAI is starting to adopt MCP—are really accelerating agents because they're now able to access production-grade tools and the system of record that the company cares about. It's not going to happen overnight, but I think in the next six months, we're really going to see agents in production. The implication of that is going to be that we're moving from efficiency to capability. Tools today are mostly about information retrieval and asking questions, but agents bring new capabilities to bear. Those are going to be spread across two areas. One is automating entire jobs. Today, if you think about what ChatGPT does inside of an enterprise, it's not automating my job; it's just making me more efficient. But as we talked about, we can now have agents that I can scale up 10,000x, and they can actually expand my capabilities. For example, if I'm working in private equity and I need to look at a deal room, that might have taken five analysts three weeks to really go through. Maybe they work 24/7 and it takes them three days, but now it can happen in five minutes. That's a completely new capability that's totally changing what's possible, the way I work, and the speed at which I can work. These job functions are just really going to change. The second thing is it's going to add new capabilities to the products that people build. Adding a chatbot to your website is cool, but really adding a high level of personalization to your product where your customers are able to leverage the agents you've built for them to completely change what's possible is going to be really transformational. We're seeing lots of tech companies launch these agents, and I think we're going to see them come to enterprises in the next six months as well.

Sam: I think that framing is really helpful to think about what it means to have autonomy and where these agents are going. You're going to go from efficiency gains to capability gains, as you say. I think that's a really powerful way of thinking about it.

Felix: A byproduct of that that's going to be interesting is that I was talking to one of our product designers, Stain, and we're building this new agentic thing. I told him, "You need to change a user's impression of what this interface looks like," because people are familiar with a search bar and a chat interface. But if you change to an agentic thing, it still looks like a chat interface if you just think about it naively. The problem is now that you've switched from information to capability, you probably want a different interface. If I look at a chat, I just assume it's going to give me information. If I look at the futuristic version of this, I need to know, first of all, what capabilities you have. How do I know what you can do? What can I ask you? It's funny because right now when you're looking at chat...

Ben: Felix living in the future.

Felix: I'm already there. But if you look at a chat today, you don't mind that it doesn't tell you anything because you don't expect it to have any other capabilities except giving you information. But now you look at a thing, and let's say it can traverse the site, navigate the page, change things on the page, generate things—how would I know that? So now you have this hard problem from a designer's perspective of saying, "I have to change somebody's expectation without too much overload." I do think we're going to see agents in production, and because there's a paradigm shift to capabilities, there needs to be a design shift as well. There are going to be some interesting things that come up.

Ben: My last prediction is that we're going to continue to see an explosion in vertical AI companies in pretty much every sector. I think they're going to be trying to sell to enterprises and be quite effective at doing so. There's going to be a buy-versus-build decision for these enterprises. These verticalized companies are really developing expertise and doing a good job. Right now, we see them in things like legal and customer service, but I think there's going to be a torso and tail of these vertical AI companies coming in. The benefit is that they're obviously experts in what they're doing. The drawback is that they're not necessarily a single source of truth for your AI strategy or for orchestration. The enterprises don't necessarily get to own that technology and capability that could be very core to what they deliver and the way that they work. So I think enterprises are going to face a choice in the next six months, not just in customer support, but in every job function and every role, of whether they want to buy versus build on AI.

Sam: What do you think the right answer is?

Ben: Well, I think it depends. Not just talking our book, I think it really does depend. Sometimes it does make sense for them to go with one of these companies; they're providing very impactful services. I think when it's really core to the business and to the strategy, when you pull forward five years, owning the AI capabilities is really important. Especially for a leading enterprise. In the SMB and mid-market segment, it makes a lot of sense to buy. But if you are a market leader and an industry leader, you don't want to get the average of your competitors. What you want is to be differentiated. One of the core values that enterprises have is distribution and a lot of user data, which means that they can build very customized, bespoke experiences that no one else can build. They don't want to become just like everyone else. They should lean into what makes them great and makes them the industry leader. So building, oftentimes with a partner like Scale, really gives them that edge and expertise over the fifth-best player in the space.


The future of your industry starts here