AI agents. Worth the frothy hype?

AI agents. Worth the frothy hype?

Everybody is talking about agents. Here’s what you need to know to have a reasonable handle on what this technology really is and what it means for the present and future of business.

5
min read time

It seems like you can't throw a rock nowadays without hitting someone talking about how agents are going solve all of our problems. But what is an agent, really? And how might you starting looking at your workflows through an agentic lens? And what other nifty words can you string together at after-work meetups to impress your colleagues?

Framing the problem

When it comes to reliability, consistency, accuracy, and predictability, LLMs kind of suck. It's the main reason everyone's high hopes and wide-eyed predictions haven't come to pass and why many businesses seem to be frustrated at their inability to quickly implement and profit from AI. The core issue is that the amazing power of LLMs is also their weakness. They contain nearly all of the world's knowledge, yet it's only accessible through the terribly imprecise and ambiguous interface of human language. In other words, prompting. The most advanced applications in the world at some point need to submit human language to an LLM and cross their fingers that the LLM will be able to return a decent output.

The second problem is a lack of capabilities. An LLM is just a language model. In fact, it's so good at language tasks that it makes us assume it would be capable in other ways. However, LLMs are notoriously bad at math and logic. They are also don't "know" anything outside of their training data (like what happened this morning in the news, or recent stock market movements, or proprietary info like the contents of your CRM). So they are incredibly smart in some ways and frustratingly dumb in others.

To compensate for this lack of precision, ability, access, and controlability, clever software engineers have come up with programming techniques that wrap around the AI to augment and utilize the LLMs' strengths.

One of these techniques, used to great effect is RAG, or Retrieval Augmented Generation, which allows you to search for and send fragments of your private knowledge to the LLM for synthesis. It's great, but it also suffers from a handful of problems that are easy to look up (just ask ChatGPT). I'll talk more about RAG in another article.

Agents are another architectural technique that have incredible promise. An agent is simply a software architecture. It's a way to organize code that follows a basic pattern. And it has an infinite amount of variations depending on what you want to do. It's like the idea of a cake. A cake has certain properties and processes that make all cakes the same. But there is an infinite number of cakes you can make by adjusting the recipe, ingredients, and your cooking technique.

Definition of an agent

The term "agent" is loosely defined right now. But I like Anthropic's definition, which basically says that an agent has some level of autonomy to make its own decisions.

Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks

This would exclude agents that simply follow a predefined workflow. It doesn't mean an agent doesn't include a workflow, it just means that it's not constructed solely of one.

Workflow

A basic but powerful aspect of agents is that they are usually constrained to a workflow. In other words, we predefine the steps to a process rather than leaving it all up to the LLM. A simple workflow might be just be a straight line (do step 1, then step 2, then step 3). But a complex workflow might have conditional logic and might depend on output from tools or other agents along the way. This combination of a well-defined workflow coupled with the ability for the LLM to make its own decisions along the way is what has everyone salivating over the possibilities.

In general agents tend to have three phases: Planning, Execution, and Reflection. The agency of an agent is that it can do it's own planning, it can decide how to execute, and it can measure it's own success by reflecting on its output. It's this autonomy that makes agents both powerful and frail.

Planning

A simple agent

Let's look at a simple agent to get an understanding for how they work.

The simplest agent will have two phases. A planning phase and an execution phase. When the user submits a request, the LLM will deconstruct it and make a plan. Let's say your request is:

I want to go to Paris for a week. Create an itinerary for me.

If I send that request directly to the LLM I'll get a long response with a seven day plan filled with typical tourist destinations.

Not bad. But it's making a lot of assumptions and it's not very precise. Like, what is the weather currently like and how should I dress? How would this change if I were a vegetarian? What are the budgetary concerns? How much would a flight cost? I could keep working on my prompt, but at the end of the session, I probably wouldn't have a very actionable itinerary.

Let's try this again, but instead of asking the LLM for a response, let's ask it to make a plan to make a response. Here's my modified prompt:

Here is a request:
'''
I want to go to Paris for a week. Create an itinerary.
'''
Create a plan for how you would accomplish this request. Include questions you would need to answer to complete the request successfully.

This time, the output will be much more thoughtful. Instead of trying to create the itinerary directly, it would spit out the process it would use to create the itinerary. What we're doing here is asking the LLM to create its own chain-of-thought plan.

Asking the LLM to create a plan makes it think more realistically about the problem. This is one of the key aspects of what makes an agent powerful. In this simple example we aren't giving the AI much help in creating the plan. But imagine if we gave the AI some expert knowledge from our organizations to help with planning any given task.

Execution

Now we can ask the LLM to follow it's own plan and ask us a bunch of questions about our preferences and desires which it could combine with it's general knowledge to create an amazing itinerary for us.

Tool use

In the simple form above, an Agent isn't much more than a fancy prompting technique. The power really opens up when we include tools. To an Agent, a tool is a black box that takes input in some form and promises to return output in some form. A tool can be anything! Some examples:

  • A search engine that takes a query and returns results
  • A calculator that takes an expression and returns an answer
  • An API that accepts an API request and returns a response
  • A RAG system that takes a question and returns an answer with references
  • Another agent that takes a command and returns an answer

When we decide to use tools, we give the agent a menu of tools it can use along with a description of what each tool does. Now when the agent makes its plan, it knows it has these capabilities available to it at execution time and can plan accordingly.

Let's update our trip planning agent with some tools. Let's give it a weather app, a search engine, TripAdvisor's API for flight pricing, hotel pricing, and restaurant recommendations.

Here is a request:
'''
I want to go to Paris for a week. Create an itinerary.
'''
Create a plan for how you would accomplish this request. Include questions you would need to answer to complete the request successfully.

You have the following tools at your disposal:

Weather: A tool to check historical weather patterns
Flights: A tool to check airfare schedules and prices
Hotels: A tool to check hotel availability and prices
etc.

Now when the agent executes on its plan, it can actually look up the expected weather, search for current art exhibitions, and find up-to-date flight and hotel pricing. An even more advanced agent might be able to reserve or purchase those tickets and hotels for you.

Human in the loop

You can see in our last example where the agent can start to have a lot of autonomy. But just like a real agent, we would probably want them to check in with us on the big decisions. Agents can be programmed to have a human validate or accept certain actions. This can be done at the beginning (signing off on the plan),  along the workflow (approving intermediate decisions) or at the end (giving sign-off before the purchase).

Composing agents

I mentioned earlier that an agent can use another agent as one of its tools. It doesn't take much imagination to see how powerful this could be. Almost any business task you could imagine could be performed in an agentic way if it were properly decomposed into agents, sub-agents, and tools. This is why people are excited. We've seen that LLMs can write code, create UX designs, analyze data, translate languages, do reporting, etc. If we could build agents for all of our small tasks and chain them together to accomplish large tasks, the only thing left to do would be to sit on the beach, finish our trashy novel, and order another Mai Tai.

Reflection

Here's a dirty little secret that every AI developer knows very well: LLMs are not very good at reasoning. They seem like they can reason because they usually offer responses that make sense, but when pressed to do true logical thinking, they usually perform poorly.

One way to improve their output is to use reflection. This technique asks the LLM to look at its own output to see if it actually fulfills its assignment. If it doesn't, try again (and again). There's more nuance to it, but that's basically it—a brute force approach to answering a question. We can add a reflection step anywhere along the workflow to ensure that the agent is doing its job. It would look something like this:

Once you have created the itinerary, review it to ensure it meets the requested goals. If not, update the itinerary.

The agent would literally evaluate the answer with regards to the original question (which it likely forgot in the ensuing machinations) and ask itself... did I answer the question well? 

Weaknesses

As with everything AI however, the hype is way ahead of reality. Agentic workflows are real. They are being implemented successfully as we speak (including at Machine & Partners), but they are far from realizing the dream (or nightmare) of complete automation. The main problem with agents, especially regarding practical implementation, is their complexity. Performing all these actions requires a lot of LLM calls and that means a lot can go wrong.

If something goes haywire in the workflow, the whole job is broken, just like if a conveyor belt breaks in the middle of a mayonnaise factory. It's a mess. LLMs are still relatively unpredictable compared with the traditional automation tools we have grown used to. Even the idea of automation implies a perfect result every time, and that's just not how agents perform today. They tend to do well when the workflow is very clear, the decision-making criteria are very well-defined, and the reasoning is performed largely by programming logic rather than the LLM.

The new reasoning models like OpenAI's o1 and o3 that are emerging have actually been trained to, well, reason better. But they still don't do actual reasoning. They use a combination of training examples, reflection, and iteration (basically multiple tries at the problem), to increase their performance. Despite all the hype out there, we have to learn to accept that language models are never going to deliver bullet-proof "thinking." In fact, recent studies have shown that logical reasoning is impossible for them to achieve given their architecture. The question everyone in the AI world is investigating right now is: How good is good enough? 

Does this mean agents are bullshit? No! Agents are a powerful architectural concept with a lot of knobs and dials you can twiddle to dial in their performance. But they aren't a magic bullet, they don't "just work," and they aren't easy to implement for non-trivial tasks if you want very high quality output.

In Summary

This has been a high level overview of the basic workings of an agent. In practice, they can be incredibly complicated with lots of variation in how workflows and processing are directed. What I hope you understand from this article is that agents aren't magic.

By combining what LLMs do really well (interpret language, perform simple reasoning), with what traditional programming does really well (execute well-defined logic), we can build on the strengths and overcome the weaknesses of both.

Agents are really cool. But don't get misled by the YouTubers who build some "powerful" agent in 30 minutes that can replace your marketing department. Likewise, all those people who are talking about the possibilities of agents are doing just that—talking. There are without a doubt places in your business where an agent could provide heretofore unachievable efficiencies. But as the team at Machine & Partners knows well, designing, building, and validating that agent won't be a cakewalk.

about the author

Ed is a partner at Machine & Partners. He spends way too much of his free time trying to keep up with the news and advancements in AI. The rest of the time he's playing tennis, driving his teenage daughter around, or cooking with this therapist wife.

deep thoughts

Want to be the first to know when a new article drops? That’s easy.

Thank you! Look for a confirmation email (it might be in your spam)
Oops! Something went wrong while submitting the form.