General

We voted on the internet's hottest AI takes | Human in the Loop: Episode 9

byonJuly 10, 2025

In today's episode of Human in the Loop, members of Scale’s Enterprise team (Clemens Viernickel, Sam Denton, Felix Su, and Ben Scharfstein) react to controversial statements about AI and vote on whether they agree or disagree. 

The Hot Takes: 

  • Coding agents will replace all engineers
  • Fine-tuning is dead. Context windows killed it
  • Single-agent systems are more reliable in the real world than multi-agent systems
  • The tooling around agents is much more nascent than anyone is willing to accept
  • The primary barrier to enterprise AI adoption is not technical, it's human
  • Rapidfire: The team's personal hot takes

About Human in the Loop. 

We share our learnings from working with the largest enterprises, foundation model builders, and governments to equip you with the insights you need to build practical, powerful AI systems in your enterprise.



Watch the video or read the transcript below to get their insights from working with leading enterprises and frontier model labs. 

Episode 9 - Reacting to the Internet’s Hottest Takes on AI

Coding agents will replace all engineers

Monica: Coding Agents will replace all engineers.

Clemens: The capabilities of the models are definitely going to increase. I find it difficult to say the quality of the model is not good enough, because it's probably going to get even better than an engineer is today. But that doesn't mean there will be no engineers at all. The job of the engineer, as with most tech throughout history, will just change. Engineers will do a different job or they will make different judgment calls, similar to how we coded with punch cards and assembly. You add more and more abstractions, and to me, it feels like there's going to be a massive increase in the abstraction that we're adding. It doesn't mean there's no engineer; the engineer just makes different decisions.

Ben: The hot take that I would say yes to is that maybe engineers won't write code. We'll see if that's the case. I would say that engineers in 10 years probably won't be writing code, in the same way that engineers are not doing long division by hand. It's not the job anymore. It's about designing the system, in the same way that a mechanical engineer is doing a very different thing than they did 50 years ago.

Sam: I feel like it just comes down to the qualifications on the word "all" and the timeline. I mean, we're talking about five years from now versus 300 years from now. It's not a very hot take anymore, but maybe it is.

Felix: If a model is just probabilistically trying to figure out the next best thing to do, there are certain designs where there are 10 options. A model is not going to pick the best option all the time, and neither is a human. It's about picking an option based on our needs, like "I'll pick option one or option three."

Ben: If you write good enough requirements and just state the acceptance criteria... but then we need engineers to do that. So, I don't know. Have PMs do it?

That's a hot take.

Sam: Oh.

Ben: No.

Sam: PMs are definitely not going to... Okay, I'm fine with that.

Fine-tuning is dead. Context windows killed it.

Fine-tuning is dead. Context windows killed it.

Sam: My take on this is that as we start asking LLMs and agents to do more complex things, the amount of information you would have to put into the context window to solve a long, complex, domain-specific problem doesn't give you as much bang for your buck as framing it as a fine-tuning problem and training a model to solve it.

Ben: I agree. I think the framing that helped me understand the difference is that you put specific information into the context window, but the value of fine-tuning isn't teaching information; it's helping the model learn how to make decisions. As we move towards agents for tool calling that need to understand how to solve specific problems, that's where fine-tuning and reinforcement learning in environments will become really valuable. Historically, fine-tuning was to teach a model to speak in a certain style or respond in a certain way. We've seen it's not good at teaching information. Context is king for many things, but it won't necessarily help an agent do the right tool calling at scale. That's where I think fine-tuning is going to have a resurgence.

Felix: I've changed my mind, actually. I was on the other side. Let me explain. I totally agree with you guys. Especially since I've been working on this "long agents" thing, long-term decision-making a hundred percent needs to be tuned. My first thought was about the 99% of applications out there today. From that lens, you can solve a lot of applications that would make an impact today just by throwing stuff into the context window. We did a lot of RAG before because we were trying to pull and shorten things, and now we have million-token context windows. It feels like so many of the easy problems are almost solved. But I a hundred percent agree with you now. The current stage of bringing data to the context window is going to become a solved problem, and then we'll have the next unsolved problem.

Clemens: I was in between on this one. I agree that fine-tuning is dead in the sense that it's a temporary thing. Maybe it's a reverse hot take, but the effort of fine-tuning in the majority of cases doesn't seem worth the hustle. We've seen time and again, at scale, that the progress of the models themselves and the techniques to improve quality have outpaced what you can get from fine-tuning. So I feel fine-tuning is a very temporary thing, and I would agree that it's dead. But I also agree with Sam that the context window is not the reason. There's still a difference between the model's capabilities and the context you're providing. But I feel we won't be talking about fine-tuning in a few years.

Sam: It also depends on how you define fine-tuning, because post-training is a very big category too.

The cost of under investing in AI right now is organizational extinction.

Monica: The cost of under investing in AI right now is organizational extinction.

Felix: We have to say yes; we're an enterprise AI company.

Ben: I think for some companies, yes. You guys are thinking about the losers. Maybe I'm being too technocratic about the optics. If you're Meta and you underinvest—if you don't invest $14 billion—you're going extinct. You should invest a lot more. But I do actually think that for some companies, it's true. The classic thing people say is, "Can the incumbent adopt the tool faster than the startup can build distribution?" I think for a lot of companies, they thought they had a really strong position. For example, Google thought they had a strong position in search, but if they don't adopt AI, they're out of it. And I think that's probably true. Obviously, Google is on the cutting edge of a lot of different things, but even your kind of non-tech or non-software company is at a real risk for disruption. They're going to be disrupted by people that are taking a counter-position, taking AI and really adopting it into their product. We're seeing that enterprises are willing to buy software from companies of a size they would never have considered before, which is very disruptive for large incumbents. Obviously, some industries have such a strong network effect or foothold that they can wait and see, and in many cases, that may be the right thing to do. But everyone should buy enterprise AI.

Sam: I feel I'm just too practical about questions like that, but I think there are many companies where that is true. Maybe not every company, but I think it's a pretty reasonable hot take.

Felix: Even if you aren't talking about companies competing directly in AI, and are just talking about a classic, non-sexy, big corporation, it's also about margins. We have companies come to us saying, "Look, we can't take on some accounts because our margins would be razor-thin." But their clients' expectations are changing. They say, "You have AI tools; you should be able to do it for cheaper." If they don't operationalize AI, they will lose those clients. So even for non-tech companies, at some point, they will encounter a situation where a really important competitor improves their margins, and they are going to lose business. That'll be bad.

Clemens: My 'no' was because it's definitely not true for every company, but we are all here because we strongly believe this is the case for many. The interesting thing we've seen over the past two years is that the enterprises we talk to definitely see it that way. It's astonishing to see the level of urgency and speed these enterprises show with regard to investing in AI. It's not just investing or believing they might go extinct, but it's also the way they're buying software and trying to roll it out. They're trying to increase their investments. Even with the slightest success shown in a pilot, they want to double down and 10x their investment because they definitely believe it.

Sam: And I think the exciting thing is it's finally starting to pay off. I think the people who haven't made that investment are going to be doubling down really quickly, really fast.

Ben: What we saw maybe 12 months ago was, "Let's invest in this. Let's do a proof of concept. Let's build an MVP. Let's prove there's something here." Now, people are coming to us and they know they want to do it. They know they need to do it. We're not trying to sell them on some small skunkworks project. This is a core initiative of the company. That's what we focus on at Scale: problems that get mentioned on earnings calls, that move the stock price—the most important problems for a company. And there are these problems for every single company that we work with. I think for every single enterprise, there are things where they can fundamentally transform themselves with AI.

Felix: I want to be careful about something. I want to encourage companies because these things sometimes take many shots on goal. We can't expect that just because you're a big company and you invest in AI, bang, you'll get something high-value on the first try. (If you work with us, you will. So we do encourage that.) But if you take a shot and it turns out your users don't think it's transformational for their jobs, it doesn't mean AI isn't transformational for your company. It just means you might have chosen the wrong thing. This is what startups do: one out of every X startups is successful. It's the same when you start something new with AI. A lot of these companies don't have experience with that, and the fear of, "Oh, I took a shot, it didn't work, let's just pause on AI," is not the right attitude. You will encounter a competitor that takes the right shot, and then you will be in trouble. That's why it's important to work with vendors who know the space and know which shots to take.

Clemens: It's interesting to see, talking to two life science companies—both of which are super big—they both believe AI is an extinction-level event for them. But still, looking at their priorities for where they want to apply AI, they are orthogonally different. It's surprising to see how the biggest problems a business wants to solve with AI can be very different, even for the same type of company.

Single-agent systems are more reliable in the real world than multi-agent systems.

Monica: Single-agent systems are more reliable in the real world than multi-agent systems.

Clemens: You take that again. Start with a disagreement. I guess.

Felix: Let me frame this because it's a little weird. Would you agree or disagree? I'll disagree. The reason I disagree is that if you share the full state in the context across your system, I think that is more reliable. If you have all the information, you can make better decisions, and there's always a loss when you do handoffs. It really depends on how you define multi-agent or single agent. I do think multi-agent systems are better if I can have specializations in the state machine I'm designing; that makes perfect sense. For example, in an actor-critic model, you want to separate the state between the two because you don't want the critic model to be polluted by what the actor is doing. So, if you define it as separation of responsibility, I think that is required in many scenarios. But if you're talking about just handing off a sentence to another agent, I think that's disastrous. So you need to separate responsibilities, but if the agents are contained and share the same state, I think that is more reliable.

Sam: I'm in the middle because I feel it's directly proportional to the complexity of the problem. The more complex the problem, the more reliable a multi-agent system is. The less complex the problem, the more reliable a single-agent system is. That's my take, but I'll transition to agreeing now.

Ben: We're at a phase where we haven't necessarily built out robustness in multi-agent systems. I don't think single-agent systems will be more reliable in the long term. But today, most people are building single-agent systems that aren't doing complex message passing where you have to manage state. So, they are more reliable now. Obviously, we're seeing labs like Anthropic and others building multi-agent systems that are more reliable or can be better. But today, the vast majority of people are building simpler, single-agent systems.

Clemens: To me, the phrasing is more about the "reliable" part. It's very hard to say whether single-agent systems are better than multi-agent systems; I would probably disagree with that. On reliability, it seems like a first-principles thing. Given that agents aren't that reliable in general yet, if we add multiple unreliable things together, basic statistics says they're probably going to be less reliable. But of course, depending on your use case, it might still be more adequate to use a multi-agent system, even if it's less reliable.

Felix: It's like adding dimensionality to a system. When you add dimensionality, you can probably solve a problem in a more specific way, but you also introduce potential error modes that you might not see. So you don't like a single-file, 10,000-line PR? I've contributed to one of those.

The tooling around agents is much more nascent than anyone is willing to accept.

Monica: The tooling around agents is much more nascent than anyone is willing to accept.

Ben: Maybe we are all willing to accept it. I think everyone knows the tooling is very early. It is very, very early.

Felix: I don't know. It's mostly just SDKs wrapping stuff. Realistically, we've had function calling in models for a while, maybe over two years. So, yes, it's easier now. It's an easier way to implement it so you don't have to do all the post-processing, and OpenAI's Agents API allows you to loop a little better. But to me, a lot of it is just better UX. The UX is better, so people like it more, use it more, and feel more powerful because they can do more things without thinking about it. Accessibility and distribution are better. But realistically, have we made massive advancements that improve tool use? Can we now do rollouts of 10, 100, or 200 tools with perfect accuracy? A lot of this comes down to the business logic of the problem. We talked about fine-tuning and reinforcement learning; there's so much to do to make it good in an enterprise setting where most of us work. So, I feel like not much has really been done.

Sam: I submitted this one, and I was really thinking about all the nitty-gritty little details that are so frustrating. Someone sees a Twitter demo, but they don't realize that OpenAI messages don't have IDs on them. It's all these little annoying things you're trying to work through. For example, when you tokenize, it removes whitespace, but when you de-tokenize, it doesn't add it back. When I think about what software engineering tools that have been around for 20 years look like, I think people forget these new tools have only been around for about 12 months.

Felix: I was walking over to your team, and I was asking them how they did on the RLVI. They said, "We're convinced, but we have to do all this stuff." What is that stuff, exactly? I'm always asking for examples.

Sam: Those are the examples I was just giving. That's exactly what I meant. When you call a tool, you have these chat templates that get rid of whitespace for you in the tool call. But then when you want to put the tool call back into the rollout, the whitespace is no longer there, and now all your token IDs are offset. These tiny little things are so frustrating when you're trying to build complex systems at this level.

Ben: The tooling around Twitter-demo agents is pretty good. Devin is killing it. They just crossed a hundred million. But when you actually want to use these things, it's very different to put something on localhost versus on the cloud in Kubernetes, making it scalable and fault-tolerant. I think we're still in the early days. People probably understand and accept that we're still very early. The thing people aren't really talking about is the tooling we're going to have to develop around continuous learning for these agents, so they get better the more you use them. We're still in negative-one innings on that. Researchers talk about continuous learning, but it's not even talked about in the mainstream Twitter-sphere, let alone by people who are really building it.

Clemens: To make this an actual hot take, since we're all agreeing: You mentioned that many tools have been around for 20 years and are getting really good. But given the pace at which new stuff comes out, we're probably going to have early-stage tooling for the foreseeable future. We're in a new world where the velocity of software coming out is so high across the board. By the time OpenAI fixes things in their SDK, they've already come out with the next three SDKs and a new protocol. These are again cutting-edge and maybe the best, but they're still early-stage. I feel like we're in for unstable tooling for a long time.

Ben: That said, the cost of writing code is going to zero, and we're speed-running all of these things. Three, five, or ten years ago, it would have taken us a lot longer to make the tooling robust. Now, with coding agents and Claude writing 70% of the code, we're going to get to robustness a lot quicker.

Sam: Maybe that's why nothing is robust. It's written by AI.

The primary barrier to enterprise AI adoption is not technical, it's human.

Monica: The primary barrier to enterprise AI adoption is not technical, it's human.

Sam: I don't even know how to interpret that. It's a quick hot take. I think the best way to adopt AI in an enterprise setting is to find the right natural place for humans and AI to work together. If that's your belief, then by definition, being adopted by and aligned with humans is going to be your biggest barrier. But many things in enterprise technology present barriers to AI, which is why both feel like barriers to me.

Ben: The way I'm interpreting this is that we don't need to invent new technology to get adoption in enterprises.

Sam: I was just thinking about data ponds and lakes all over the place. I agree with that.

Ben: I think technology needs to get built, but not invented. Implemented, but not invented. If the models got no better starting today, we would still have a 10-year transformation in front of us. Sam Altman was talking to Gary Tan of YC the other day and said that if GPT-3 never improved, every startup in that room could still exist and do big transformations. I think that's true. It's human in the sense that you need to implement it, do the change management, and write the basic software that handles data pipelines, security, and all of that, and iterates on the UX. Those are the human things. They need to build the technology. The reason enterprises haven't adopted AI is not because the model isn't good enough. Maybe GPT-3 wasn't good enough, but since GPT-4, the model has been good enough to do a lot of things. Obviously, it's getting better, and reasoning models have really pushed that boundary out. At this point, it's a time problem. Maybe it's not a human problem, but a time problem. It just needs to get implemented. It goes back to the UX piece.

Felix: This goes back to why I think coding won't get replaced by AI. One of the reasons we aren't seeing super successful products today is because it takes a lot of people to design every step. How will these data pieces come together? What is the product aspect? What impact will it have? Think about some of the massive accounts we have. There are so many moving pieces and different things you have to connect in the right way, and then you have to put it into a product that presents a solution. Forty different things could go wrong if you make one wrong decision. This is why I think it's fundamentally a human barrier. Humans have to get together and navigate away from the bad paths and toward the good ones. Maybe only one out of 50 paths is viable and will make a difference. To get a whole village to believe in you and massage it that way without knowing the path, I think is a fundamentally human problem. It's a political problem, a way-of-thinking problem, a personality problem. It's not about tech.

Clemens: The majority of humans just don't like change. There are so many barriers to doing all of this, but I would definitely agree with the take that the primary barrier for many things is going to be human because it's just rapid change, and change is not something people like.

Ben: So what I'm hearing is that management consulting's not going away. Congratulations. Alright.

Personal Hot Takes Rapidfire

Monica: Finishing off the episode, give us your biggest hot take.

Felix: Alright. My hot take is that almost all entry-level jobs have a negative expected value right now. Let me add something to that: I think the only reason to have an entry-level person is to train them to become a senior-level person.

Sam: Alright. I feel like I'm going to get in trouble for this, but I live in New York, and New York's great, but San Francisco is definitely the best city in the US. If you disagree, you don't know what you're doing.

Clemens: A semi-hot take: this NBA season was not that great.

Ben: My hot take, which I don't think is really a hot take, is that watermelon is vastly overrated. It's not close to a top-tier fruit. Watermelon is just terrible. Terrible.

Felix: That invalidates everything you just said.

Monica: Thanks for listening to Human in the Loop.


The future of your industry starts here