Understanding AI Agents: A Comprehensive Guide for Busy Individuals
Written on
Separating the Hype from Reality
As we welcome Spring 2024, the landscape of AI agent frameworks is flourishing, each one asserting its potential to “transform everything” more than its predecessor. It's nearly impossible to scroll through your feed without encountering a dazzling demo from a GitHub repository, each boasting thousands of stars seemingly overnight.
These repositories claim that anyone, even those who still rely on Internet Explorer, can now create a complete application from a single input. Interestingly, many of these demonstrations tend to revolve around variations of the classic snake game.
Conversely, some individuals possess "exclusive" insights or information they claim they "can't disclose yet," hinting that soon, [insert-any-industry-here] will undergo a monumental shift. Yet, they remain tight-lipped about the specifics.
While such excitement is common in the AI realm, not all AI-related news receives equal attention. Truly captivating announcements are rare, often limited to new model launches from major players like OpenAI or Claude. The emergence of AutoGPT has also highlighted the ability of AI agents to capture public interest.
Unlike the generally favorable reception of OpenAI’s model launches, AI agents generate a mixed response. People tend to split into two camps: one group fears being replaced by advanced AI, envisioning a future dominated by machines, while the other group believes AI agents will enhance their productivity and lead to lucrative entrepreneurial opportunities. Surprisingly, both perspectives often stem from a tendency to overestimate the capabilities of AI agents.
Conversely, there are those who disregard AI agents altogether, perceiving them as no different from interacting with ChatGPT or utilizing an application based on LangChain with basic retrieval-augmented generation (RAG). For this group, the buzz surrounding AI is merely a creation of opportunistic influencers and corporations.
So, who is correct? What are AI agents, and what can they truly accomplish? This article aims to answer these questions. Don't worry if you're not a machine learning specialist; this will be a gentle introduction to AI agents for anyone interested.
The Roots of Rationality
At its essence, an agent is simply a term for anything capable of taking action, whether that be a human, an animal, or even a machine.
The concept of intelligent agents is not new and has been discussed for centuries, particularly in the context of self-driving vehicles.
The origins of this idea can be traced back to ancient Greece and the philosopher Aristotle. While some of his views may have been questionable—such as the notion that men have more teeth than women and that more teeth correlate with longevity—his reflections on what it means to “act” or “achieve” goals were significantly more insightful: “We deliberate not about ends but about means.” This underscores that rationality involves determining the best approach to achieve our objectives rather than merely selecting the goals themselves.
This foundational idea ignited centuries of discourse on rationality and set the stage for the AI agents we recognize today.
Several centuries later, in the 9th century, Al-Khwarizmi advanced the idea of step-by-step problem solving, earning the title of the “father of algebra.” In the 12th century, a translator tasked with converting his work, On the Calculation with Hindu Numerals, into Latin inadvertently transformed his name into “Algoritmi,” leading to the term Algorithm we use today.
Jumping ahead to the 13th century, a Spanish philosopher named Ramon Llull had a visionary idea. He created a “machine” with rotating paper wheels inscribed with symbols, aiming to represent fundamental “truths” or “laws” of existence. He is often regarded as a pioneer of computer science and theory.
Though Llull’s invention didn’t gain traction, it planted the seeds of “computation.”
Fast forward to the 1600s, when the mathematician Blaise Pascal developed the first calculator. This innovation enabled machines to perform calculations faster than any human—a pivotal moment in the evolution of intelligent machines. To clarify, when I say “calculate,” I mean this device could only add and subtract, which was impressive at the time.
The next significant advancement occurred in the 1800s with Ada Lovelace, a mathematical genius who recognized the true potential of computing. She authored the first computer program for Charles Babbage’s Analytical Engine, a steam-powered computer far ahead of its era. Although the Engine never fully reached its capabilities, Lovelace's foresight regarding machines capable of managing complex tasks laid the groundwork for the AI revolution.
The Rise of AI
The 1950s heralded the advent of artificial intelligence as we know it. In 1950, Alan Turing, a foundational figure in computer science, published a pivotal paper that posed the question: “Can machines think?” Many believed the answer was a resounding NO. To counter this skepticism, Turing introduced an imitation test (now recognized as the Turing Test) whereby a machine attempts to deceive a human into believing it is also human through ordinary conversation.
Not long after, a group of scientists convened at Dartmouth College for a groundbreaking workshop that would alter the course of history. Their mission was to create machines that could think like humans. This legendary gathering, spearheaded by computer scientist John McCarthy, laid the foundation for the field of AI, with the belief that they only needed two months and ten men to build a “smart” machine.
“We propose that a 2-month, 10-man study of artificial intelligence…”
“..An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”
It was during this period that the concept of “symbolic AI” emerged.
Symbolic AI focused on representing knowledge through abstract symbols and manipulating these according to strict rules, akin to an advanced version of Aristotle’s logic. McCarthy and his colleagues believed that by combining enough of these symbols and rules, they could create machines capable of reasoning, planning, and problem-solving like humans.
This approach yielded some remarkable systems in the 50s and 60s, such as Dendral and MYCIN, which could interpret laboratory results and identify unknown compounds.
However, symbolic AI soon encountered significant challenges. The real world is complex and chaotic, often defying neatly defined logical rules. Imagine attempting to catalog every rule for making a sandwich! As symbolic AI aimed to tackle more ambitious challenges, its limitations became increasingly evident.
In the late 1960s and early 1970s, the field entered a phase known as the “first AI winter.” Funding dwindled, progress slowed, and faith in the lofty promises of human-like AI diminished. It became clear that symbolic logic alone was insufficient—the world demanded a fresh approach.
Embracing Uncertainty
As researchers recognized the constraints of symbolic AI in the 1970s, they began exploring new methods to address the uncertainty and complexity of reality. Two significant concepts emerged during this time: the application of probability and the rise of machine learning.
Starting with probability, the 1980s saw the introduction of Bayesian networks, which enabled AI systems to “reason” about uncertainty using probabilistic language. Rather than depending solely on rigid logical rules, these networks could learn from data and make informed guesses in the face of incomplete information.
Simultaneously, machine learning was experiencing a revival. In the 1980s, a novel training technique called backpropagation reinvigorated neural networks, allowing them to learn intricate patterns from data.
This pivot toward probabilistic and learning-oriented approaches revolutionized AI agents. Instead of merely reasoning with abstract symbols, agents could now learn from experience and adapt to new circumstances. It represented a shift from a rigid instruction manual to a dynamic, evolving comprehension of the world.
This new paradigm spurred advancements in two key areas of machine learning: reinforcement learning and deep learning. Reinforcement learning focuses on teaching agents to make intelligent choices through trial and error, akin to training a puppy with treats.
Deep learning, conversely, employs multi-layer neural networks to learn rich, nuanced representations of data, empowering agents to tackle complex tasks such as image recognition and natural language processing.
These breakthroughs broadened the definition of AI agents. It was no longer solely about “achieving a goal successfully.” This new understanding incorporated concepts like the environment in which an agent perceives and learns about the world.
So, what can AI agents actually do?
This article will focus on agents that utilize Large Language Models (LLMs) as their “brain.” While various types of agents exist, including multimodal and visual agents, LLMs are particularly noteworthy due to their distinct capabilities.
Regardless of whether they are open or closed source, all LLMs exhibit varying degrees of “Reflection” and “Common-Sense Reasoning” abilities, with some models outperforming others. These essential traits empower LLM agents to plan, engage in self-reflection, and continually improve themselves, all derived from the unique attributes of LLMs.
Beyond the inherent capabilities of LLMs, there are five other critical characteristics of agents:
Capacity for Autonomous Actions.
Agents can independently execute tasks, making decisions and taking actions without constant human oversight. However, it is advisable to have a human in the loop to ensure control and direct agents toward their goals.
Memory.
Incorporating memory into an agent fosters personalization, allowing it to comprehend and adapt to individual preferences. As our preferences evolve over time, an agent with memory can learn and adjust accordingly. This is crucial for establishing long-term relationships between agents and users.
Reactivity.
To engage with their surroundings, agents must be able to perceive and process available information. This reactivity enables agents to respond to changes, make informed choices, and provide pertinent outputs based on the inputs they receive. By analyzing and interpreting data from their environment, agents can offer context-aware assistance.
Proactivity.
Agents are capable of not only “planning” and “prioritizing tasks,” but also taking proactive measures to achieve these tasks by utilizing tools, such as searching the internet, scraping Reddit, or employing code interpreters. Currently, this is primarily accomplished through API calls and function calling.
Social Ability.
Agents can collaborate with other agents or humans, delegate tasks, and maintain their defined roles in conversations. This social capability enables agents to work collectively toward shared objectives, distribute workloads, and ensure coherent communication.
What CAN AI agents do that humans CAN’T?
The primary strength of AI agents lies in their ability to process vast amounts of information. As AI researcher Stuart Russell articulates, AI systems can perform tasks “not due to deep understanding but because of their scale.”
For instance, if you needed to analyze 100,000 customer reviews to identify common issues with a product, an average person reading at 200 words per minute would take around 52 days just to read them. On top of that, they would require additional time to analyze, summarize, and extract the critical information. An AI agent, however, could accomplish the same task in mere minutes. Furthermore, AI agents can rapidly generate any type of output you require, whether it's a newsletter, JSON, or an email.
If you were asked “to envision your life in the next five years,” you might come up with a few possible paths, each containing 4–5 significant milestones (e.g., getting married, relocating to Europe, etc.). However, if you tasked agents with this project, they would generate a far greater number of potential life paths, each filled with more details and diverse milestones.
What CAN AI agents do that a single LLM CAN’T?
“So this is just GPT-4 with RAG?” or “Isn’t this the same as chaining several prompts together?” are common queries I encounter. This highlights a misunderstanding regarding the advantages of AI agents compared to simply optimizing a single LLM.
Let's examine two primary reasons why AI agents outperform an individual LLM:
Enhanced Accuracy.
Andrew Ng has noted in his lectures that an agentic workflow using “simpler” models like GPT-3.5 significantly exceeds zero-shot prompting of “intelligent” models like GPT-4.
Enhanced accuracy stems from iterations that allow agents to “fact-check” and “review” their responses, resulting in fewer hallucinations.
Offloading Decision-Making.
Imagine wanting to create a blog about Mediterranean culture, but lacking experience. Initially, you might need to answer questions like, “What steps are necessary to establish and manage a successful blog?” and “What is the first step?”
Alternatively, you could assemble a team of AI agents and task them with breaking down the blogging process into smaller subtasks. Furthermore, these agents should prioritize all the subtasks, allowing you to focus on strategy and other critical cognitive tasks.
If you enjoyed this article, you might also like:
?? Follow me on Medium ? Check out my YouTube channel ? Follow me on Twitter/X