What does Machine & Partners do?

Machine & Partners is a boutique AI strategy and development firm helping businesses discover, build, and implement AI-driven solutions.

Why choose Machine & Partners?

Machine & Partners combines deep AI expertise with a collaborative, transparent approach to deliver high-value, transformative results for businesses.

AI agents. Worth the frothy hype?

It seems like you can't throw a rock nowadays without hitting someone talking about how agents are going solve all of our problems. But what is an agent, really? And how might you starting looking at your workflows through an agentic lens? And what other nifty words can you string together at after-work meetups to impress your colleagues?

Framing the problem

When it comes to reliability, consistency, accuracy, and predictability, LLMs kind of suck. It's the main reason everyone's high hopes and wide-eyed predictions haven't come to pass and why many businesses seem to be frustrated at their inability to quickly implement and profit from AI.

The core issue is that the amazing power of LLMs is also their weakness. They contain nearly all of the world's knowledge, yet it's only accessible through the terribly imprecise and ambiguous interface of human language. In other words, prompting. The most advanced applications in the world, at some point, need to send human language instructions to an LLM and cross their fingers that the LLM will be able to return a decent output.

The second problem is a lack of capabilities. An LLM is just a language model. In fact, it's so good at language tasks that it makes us assume it would be capable in other ways. However, LLMs are notoriously bad at math and logic. They are also don't "know" anything outside of their training data (like what happened this morning in the news, or recent stock market movements, or proprietary info like the contents of your CRM). So they are incredibly smart in some ways and frustratingly dumb in others.

To compensate for this lack of precision, ability, access, and controlability, clever software engineers have come up with programming techniques that wrap around the AI to augment and utilize the LLMs' strengths.

One of these techniques, used to great effect is RAG, or Retrieval Augmented Generation, which allows you to search for and send fragments of your private knowledge to the LLM for synthesis. It's great, but it also suffers from a handful of problems that are easy to look up (just ask ChatGPT). I talk more about RAG at length in another article.

Agents are another architectural technique that have incredible promise. An agent is simply a software architecture. It's a way to organize code that follows a basic pattern. And it has an infinite amount of variations depending on what you want to do. It's like the idea of a cake. A cake has certain properties and processes that make all cakes the same. But there is an infinite number of cakes you can make by adjusting the recipe, ingredients, and your cooking technique.

Definition of an agent

The term "agent" is loosely defined right now. It's common for people to describe any automated workflow as an agent. That makes sense lexically since the AI is working for you as an agent would, but in technical circles "agent" has a specific connotation. I like Anthropic's definition, which basically says that an agent has some level of autonomy to make its own decisions.

Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks

This would exclude agents that simply follow a predefined workflow. It doesn't mean an agent doesn't include a workflow, it just means that it's not constructed solely of one.

I'll put a really fine point on this because it's important: The defining characteristic of an agent is that they are given some level of autonomy to make their own decisions. This is different from a workflow that might have some conditional logic in it. An agent uses the reasoning capability (or lack of) to figure out how to accomplish a given task.

The reason this is so profound is that we are handing off the "thinking" part of the task to the agent and saying, "you decide the best way to get this done." This is a radical departure from traditional programming which has been refined over decades to be completely deterministic. With agents we're ceding control. This is the crux of agency.

Unpacking Agentic Workflows

A basic but powerful aspect of agents is that they are usually constrained to a workflow. In other words, we pre-define the steps to a process rather than leaving it completely up to the LLM. A simple workflow might be just be a straight line (do step 1, then step 2, then step 3). But a complex workflow might have conditional logic and might depend on output from tools or other agents along the way. This combination of a well-defined workflow coupled with the ability for the LLM to make its own decisions along the way is what has everyone salivating over the possibilities.

In general agents tend to have three phases: Planning, Execution, and Reflection. The agency of an agent is that it can do it's own planning, it can decide how to execute, and it can measure it's own success by reflecting on its output. It's this autonomy that makes agents both powerful and frail.

Planning

A simple agent

Let's look at a simple agent to get an understanding for how they work.

The simplest agent will have two phases. A planning phase and an execution phase. When the user submits a request, the LLM will deconstruct it and make a plan. Let's say your request is:

I want to go to Paris for a week. Create an itinerary for me.

If I send that request directly to a non-agentic LLM I'll get a long response with a seven day plan filled with typical tourist destinations.

Not bad. But it's making a lot of assumptions and it's not very precise. Like, what is the weather currently like and how should I dress? How would this change if I were a vegetarian? What are the budgetary concerns? How much would a flight cost? I could keep working on my prompt, but at the end of the session, I probably still wouldn't have a very actionable itinerary.

Let's try this again, but instead of asking the LLM for a response, let's ask it to make a plan to make a response. Here's my modified prompt:

Here is a request: ''' I want to go to Paris for a week. Create an itinerary. ''' Create a plan for how you would accomplish this request. Include questions you would need to answer to complete the request successfully.

This time, the output will be much more thoughtful. Instead of trying to create the itinerary directly, it would spit out the process it would use to create the itinerary. What we're doing here is asking the LLM to create its own chain-of-thought plan.

Asking the LLM to create a plan makes it think more realistically about the problem. This is one of the key aspects of what makes an agent powerful. In this simple example we aren't giving the AI much help in creating the plan. But imagine if we gave the AI some expert knowledge from our organizations to help with accomplishing any given task.

Execution

Now we can ask the LLM to follow it's own plan and ask us a bunch of questions about our preferences and desires which it could combine with it's general knowledge to create an amazing itinerary for us.

Tool use

In the simple form above, an Agent isn't much more than a fancy prompting technique. The power really opens up when we include tools. To an Agent, a tool is a black box that takes input in some form and promises to return output in some form. A tool can be anything! Some examples:

A search engine that takes a query and returns results
A calculator that takes an expression and returns an answer
An API that accepts an API request and returns a response
A RAG system that takes a question and returns an answer with references
Another agent that takes a command and returns an answer

When we decide to use tools, we give the agent a menu of tools it can use along with a description of what each tool does. Now when the agent makes its plan, it knows it has these capabilities available to it at execution time and can plan accordingly.

Let's update our trip planning agent with some tools. Let's give it a weather app, a search engine, TripAdvisor's API for flight pricing, hotel pricing, and restaurant recommendations.

You have the following tools at your disposal: Weather: A tool to check historical weather patterns Flights: A tool to check airfare schedules and prices Hotels: A tool to check hotel availability and prices etc.

Now when the agent executes on its plan, it can actually look up the expected weather, search for current art exhibitions, and find up-to-date flight and hotel pricing. An even more advanced agent might be able to reserve or purchase those tickets and hotels for you. Note that the above prompt is not how it really works. In truth, the prompt would have more information about the tool and its interface, but I'm keeping it simple here (and this really isn't far from the truth).

Human in the loop

You can see in our last example where the agent can start to have a lot of autonomy. But just like a real agent, we would probably want them to check in with us on the big decisions. Agents can be programmed to have a human validate or accept certain actions. This can be done at the beginning (signing off on the plan), along the workflow (approving intermediate decisions) or at the end (giving sign-off before the purchase).

Composing agents

I mentioned earlier that an agent can use another agent as one of its tools. It doesn't take much imagination to see how powerful this could be. Almost any business task you could imagine could be performed in an agentic way if it were properly decomposed into agents, sub-agents, and tools. This is why people are excited. We've seen that LLMs can write code, create UX designs, analyze data, translate languages, do reporting, etc. If we could build agents for all of our small tasks and chain them together to accomplish large tasks, the only thing left to do would be to sit on the beach, finish our trashy novel, and order another Mai Tai.

Reflection

Here's a dirty little secret that every AI developer knows very well: LLMs are not very good at reasoning. They seem like they can reason because they usually offer responses that make sense, but when pressed to do true logical thinking, they usually perform poorly.

One way to improve their output is to use reflection. This technique asks the LLM to look at its own output to see if it actually fulfills its assignment. If it doesn't, try again (and again). There's more nuance to it, but that's basically it—a brute force approach to answering a question. We can add a reflection step anywhere along the workflow to ensure that the agent is doing its job. It would look something like this:

Once you have created the itinerary, review it to ensure it meets the requested goals. If not, update the itinerary.

The agent would literally evaluate the answer with regards to the original question (which it likely forgot in the ensuing machinations) and ask itself... did I answer the question well?

Weaknesses

As with everything AI however, the hype is way ahead of reality. Agentic workflows are real. They are being implemented successfully as we speak (including at Machine & Partners), but they are far from realizing the dream (or nightmare) of complete automation. The main problem with agents, especially regarding practical implementation, is their complexity. Performing all these actions requires a lot of LLM calls and that means a lot can go wrong.

If something goes haywire in the workflow, the whole job is broken, just like if a conveyor belt breaks in the middle of a mayonnaise factory. It's a mess. LLMs are still relatively unpredictable compared with the traditional automation tools we have grown used to. Even the idea of automation implies a perfect result every time, and that's just not how agents perform today. They tend to do well when the workflow is very clear, the decision-making criteria are very well-defined, and the reasoning is performed largely by programming logic rather than the LLM.

The new reasoning models like DeepSeek R1, and OpenAI's o1 and o3 that are emerging have actually been trained to, well, reason better. But they still don't do actual reasoning. They use a combination of training examples, reflection, and iteration (basically multiple tries at the problem), to increase their performance.

Despite all the hype out there, we have to learn to accept that language models are never going to deliver bullet-proof "thinking." In fact, recent studies have shown that logical reasoning is impossible for them to achieve given their architecture. The question everyone in the AI world is investigating right now is: How good is good enough?

Does this mean agents are bullshit? No! Agents are a powerful architectural concept with a lot of knobs and dials you can twiddle to dial in their performance. But they aren't a magic bullet, they don't "just work," and they aren't easy to implement for non-trivial tasks if you want very high quality output.

The Future?

Here's how you should think about it: How much tolerance do you have for an agent to make a mistake? I would not put an agent in charge of the cooling system at a nuclear power plant. But I would definitely allow an agent to sort my email.

As you get into real business processes, you will find areas where traditional, highly logical and deterministic programming is the best solution. And there are other areas (probably where a human is currently making judgement calls) where an agent is a viable option. We tolerate human mistakes. And will learn to tolerate AI mistakes. It's a paradigm shift that is hard to get their heads around, but the future will be full of imperfect AIs working very efficiently. The tradeoff for hugely scaled automation will be a bunch of little, hopefully acceptable mistakes.

As a concrete example, imagine a business that needs to parse complicated semi-structured documents. Data extraction from documents (like insurance contracts, financial reports, health records, etc.) is basically impossible with traditional programming. So we typically employ low-cost labor to do that work. Humans might have a 5% error rate. A well-built agentic AI system might have a 15% error rate. That's worse! But it's 500x faster. So you would probably put some humans in the loop to error-correct the AI. These are the kinds of value trade-offs that leaders will be making as agentic systems hit their stride.

In Summary

This has been a high level overview of the basic workings of an agent. In practice, they can be incredibly complicated with lots of variation in how workflows and processing are directed. What I hope you understand from this article is that agents aren't magic.

By combining what LLMs do really well (interpret language, perform simple reasoning), with what traditional programming does really well (execute well-defined logic), we can build on the strengths and overcome the weaknesses of both.

Agents are really cool. But don't get misled by the YouTubers who build some "powerful" agent in 30 minutes that can replace your marketing department. Likewise, all those people who are talking about the possibilities of agents are doing just that—talking. There are without a doubt places in your business where an agent could provide heretofore unachievable efficiencies. But as the team at Machine & Partners knows well, designing, building, and validating that agent won't be a cakewalk.

If you want to talk about the applicability of agentic workflows in transforming your business, give us a shout.