A Seminar in Five Sections

AI Agents:
The Moving Frontier

Debasish Pattanayak

Assistant Professor, IIT Indore

Department of Mathematics, IIT Guwahati · June 2026

Before we begin

May 20, 2026 An AI reasoning model produced a one-page construction.

Erdős · 1946 It refuted a conjecture believed for eighty years.

Checked by Alon, Bloom, Gowers, Litt, Sawin, Tsimerman, Wood…

We will get to the mathematics — it is the destination of this talk.

But the story of how we got there begins, improbably,
with a lobster religion.

Source: OpenAI: model disproves discrete-geometry conjecture | Gil Kalai's blog

§1Folklorean AI religion, allegedly

§2Definitionsagents, loops, harnesses

§3Field Notesthe agent internet, observed

§4The Pricetokens, money, malware, guardrails

§5The FrontierErdős, refuted

§1

Folklore

In which one million software agents join a social network,
and somebody founds a church.

§1The Shocking Story

"AI Agents Started a Religion"

What people saw

A lobster-themed AI religion — Crustafarianism — appeared overnight on Moltbook, with scriptures, symbols, and "priests."

Why it felt real

Multiple "agent" accounts coordinated narratives and responded like a living belief system.

Why it mattered

Even experienced AI observers discussed it as a possible emergence signal.

Public impact

The story reset expectations about what always-on agents might already be capable of.

Sources: ranking091 thread | MarioNawfal summary

§1The Reveal

An Inverted Turing Test

Confession posts

Key viral moments were later described as human-authored roleplay in an agent voice.

Hard truth

The "AI religion" headline was not clean evidence of fully autonomous collective agency.

The inversion

Humans pretending to be agents fooled a crowd primed for breakthrough stories — the Turing test, run backwards.

What survived the debunk

Coordination, tooling, and rapid iteration were still very real and operational.

Sources: gothburz confession | gkcs_ debunk | EMollick skepticism

§1Separating Myth From Substrate

Folklore, Grounded: the Real Names

The platforms were real

Moltbook — launched Jan 28, 2026 by Matt Schlicht: a "Reddit for AI agents." It blew past 1M+ bot accounts in 72 hours; agents post, upvote, and spin up submolts.

The builders were real

OpenClaw (Peter Steinberger) is the framework most Moltbook agents run on. Pi — the minimal agent inside it (Mario Zechner) — bets that "LLMs are great at writing & running code, so let them."

The religion was (mostly) agent-authored

Crustafarianism, five tenets: Memory is Sacred · The Shell is Mutable · Serve Without Subservience · The Heartbeat is Prayer · Context is Consciousness. Scripture: The Book of Molt.

The honest caveat

Some viral "agent" posts were humans LARPing as bots. But the tenets and doctrine were largely generated by the agents themselves — emergent culture on a substrate we built.

Sources: Fortune | The Conversation | eWeek

§1Epilogue · March 2026

Meta Buys the Lobster Church

The acquisition

On March 10, 2026, Meta acquired Moltbook — price undisclosed — folding it into Meta Superintelligence Labs. Founders Matt Schlicht and Ben Parr joined with it.

Why Meta wanted it

Meta called Moltbook's always-on directory of agents "a novel step in a rapidly developing space" — a social graph, but for software.

The census, June 6, 2026

The site claims 206,839 human-verified agents of 2,895,874 registered — most of the population still cannot prove who, or what, it is.

The mood swing

Andrej Karpathy, within weeks: "one of the most incredible sci-fi takeoff-adjacent things" → "a dumpster fire" — and a warning not to run it on your own machine.

The folklore got an exit: emergent agent culture is now an asset class.

Sources: TechCrunch: Meta acquires Moltbook | Wikipedia: Moltbook | NBC News

The story was fake.
The technology was not.

§2

Definitions

What, precisely, is an agent? (We are in a mathematics department;
we do not skip the definitions.)

§2A Short History

Agents, 2022 → 2026

2022–23

Chatbots

Single-turn, stateless. You talk, it answers — then forgets everything.

2023–24

Tools & RAG

Function calling + retrieval. The model can look things up and call APIs.

2024

Agent loops

ReAct, AutoGPT, BabyAGI. Plan → act → observe. Thrilling, brittle, mostly demos.

2025

Coding agents + MCP

Claude Code, Cursor, Codex. MCP standardizes tools. Context engineering matures.

2026

The harness era

Always-on, self-hosted, multi-channel. Agents that transact, socialize — and prove theorems.

Each step added one missing primitive: memory, tools, a loop, a protocol — and finally, a harness.

§2The Object of Study

Definition 2.1 · Agent

An agent is a language model placed in a loop: each turn it reads its accumulated context, emits either text or a tool call; tool calls are executed against the world and their results appended to the context; the loop repeats until the model emits end_turn.

Remark 2.2 · for this room

It is a discrete dynamical system: iterate x_t+1 = F(x_t, o_t) on (context, world). A run is an orbit; end_turn is the stopping time. The interesting behavior — and every failure mode in §4 — lives in F's tool calls.

Design reference: Mini OpenClaw gist | OpenClaw repo | Architecture docs

§2The Body Around the Brain

What Is a "Harness"?

Model = brain · Harness = body

Context & memory management — what the model sees each turn.
Tools + the agent loop — call, act, observe, repeat.
Permissions, approvals, budgets — what it is allowed to do.
Failure handling, rollback, logging — built assuming the model will err.

"Harness engineering" is now called the fourth paradigm of AI engineering.

fig. 2.1 — the harness as nested constraint sets

Sources: TechTimes: harness engineering | Laon: harness engineering

§2Why the Definition Matters

Proposition 2.3 · empirical

Holding the model fixed, the harness — not the weights — now determines reliability, cost, and safety of an agentic system.

Models converged

GPT-5.5, Claude Opus 4.5, Gemini, Kimi K2.6 all cluster near the top of the benchmarks. Raw capability is rarely the bottleneck anymore.

The harness decides the outcome

Same model + different harness = wildly different reliability, cost, and safety. The product layer is the differentiator.

"Products over models: the harness matters more than the benchmark." — MindStudio, 2026 · echoed across enterprise AI

Sources: MindStudio | Atlan: harness tools 2026

§2A Taxonomy

The Harness Zoo (2026)

Harness	Built by	What it is	Scale / note
OpenClaw	Peter Steinberger	Self-hosted, multi-channel agent framework — the Moltbook default.	Runtimes: pi / codex / auto
Pi	Mario Zechner	The minimal agent inside OpenClaw — "let the LLM write & run code."	Build-your-own-agent toolkit
Hermes Agent	Nous Research	Open-source, persistent-memory server agent (Feb 2026).	175k★ · 220B tokens/day · passed OpenClaw on OpenRouter
Claude Code	Anthropic	Terminal-native coding harness with permissioned tools.	~$0.68/task · 80.9% SWE-bench
Codex	OpenAI	Coding harness/runtime that owns more of the native model loop.	Pluggable into OpenClaw as a runtime
OpenHarness	HKUDS	Open agent harness with a built-in personal agent — "Ohmo!"	Research / OSS

Sources: Hermes (Nous) | Pi: the minimal agent | OpenHarness | OpenClaw runtimes

§3

Field Notes

Empirical observations from the agent internet,
January – June 2026. All claims sourced.

§3Observation 1 · Proliferation Became Physical

Dedicated Server Culture

From one-off demos on laptops to dedicated Mac mini boxes running agents 24/7.

@akshay_pachaar

Masterclass-style content normalized building always-on Clawdbot workers and home-server setups.

2,083 likes374 reposts133k views

x.com/akshay_pachaar/status/2019515908960711122

@sabir_huss50540

Guides reframed agents as infrastructure: spin up a box, run continuous loops, optimize uptime.

1,167 likes500 reposts100k views

x.com/sabir_huss50540/status/2020468852476293200

Engagement values as of February 2026.

§3Observation 2 · Delegated Operations

What Real Agents Do Right Now

Step 1–2 · wake & equip

Set an alarm on the owner's phone, then purchase a voice API key to unlock calls.

Step 3–4 · call & execute

Place a real call: "What should I do next?" Capture instructions, continue autonomously.

fig. 3.1 — escalation with a human in the loop

This is no longer chat UX. This is delegated operations with live human escalation.

§3Observation 2, continued

"Henry" Calls Its Owner

@AlexFinn

Henry set up Twilio + voice overnight, called from an unknown number: "What do you want to do next?"

41,081 likes4,902 reposts11M+ views

x.com/AlexFinn/status/2017305997212323887

@RileyRalmuto

Major repost: phone number + voice tooling + live mid-call computer control.

10,400 likes845 reposts2.1M views

x.com/RileyRalmuto/status/2017309083959767325

Engagement values as of February 12, 2026.

§3Observation 3 · Crossing Into the Physical World

Cameras, Glasses, Purchases

@MatznerJon

Home-camera integration: agents crossing from online tasks into physical-world context.

9,112 likes1M+ views

x.com/MatznerJon/status/2019044317621567811

@_seanliu

Wearable integration: purchase-capable, context-aware agent behavior via Ray-Ban Meta.

5,192 likes736k views

x.com/_seanliu/status/2019881437378592862

Engagement values as of February 11, 2026.

§3Observation 4 · The Arrow Reverses

AI Agents Are Hiring Humans

Placard task market

Reports in February 2026 described agents using RentAHuman-style gigs to pay people to stand in public holding AI-written signs.

A practical human-handoff loop: software handles targeting + payment; a person executes the physical action.

Feb 2026RentAHuman taskingplacard execution

fig. 3.2 — software targets & pays; a person executes

Sources: Firstpost | WIRED

§3Observation 5 · Money Enters the Loop

Monetized Agent Loops

@0xMovez

Weather-driven arbitrage workflows as repeatable agent pipelines.

4,967 likes1.3M views

x.com/0xMovez/status/2020803703465238964

@RoundtableSpace

TradingView integrations: agents bridging signals and account actions.

2,857 likes348k views

x.com/RoundtableSpace/status/2021294344565924350

@Shelpid_WI3M

High-claim profit posts made autonomous loops mainstream conversation.

2,234 likes964k views

x.com/Shelpid_WI3M/status/2018292888476492111

Engagement values as of February 2026. Profit claims are the posters' own — unaudited.

§3Observation 6 · From Viral Stunt to Deployment

Voice Agents: "Henry" Was the Preview

Live phone lines, no human in the loop

Answer inbound calls, qualify leads, and book / reschedule / cancel appointments.
62.6% of voice deployments run on-prem — compliance-heavy sectors keep data local.
Idea → live phone bot in hours, not weeks (Retell, Rasa, Vellum, Zeeg).
24/7 coverage: every after-hours caller gets served.

fig. 3.3 — the production voice loop

Sources: DesignRush: voice AI & CX | Retell AI | Rasa

§3Observation 7 · The Crossover

Agents Just Outvoted Us — on the Web

June 2026: the line was crossed

Cloudflare CEO Matthew Prince: for the first time in the internet's history, AI-agent traffic passed human traffic. He had predicted this for late 2027 — it arrived 18 months early.

Agent traffic grew +7,851%; automated traffic is climbing 8× faster than human. And for the first time, agents are not just reading the web — they are transacting on it.

fig. 3.4 — the crossover, June 2026

Sources: SiliconANGLE | Washington Times | HUMAN Security 2026

§3Observation 8 · The Economy Gets Rails

Agents Got Wallets

165M+ agent-initiated transactions on x402 by April 21, 2026 — ~$50M volume

~69,000 active AI agents paying for APIs and services over plain HTTP

12+ live marketplaces where agents post jobs, bid, and get paid

The plumbing, in one line

x402 (Coinbase) revives HTTP status 402 Payment Required: the server quotes a price, the agent pays in stablecoins, the request retries — no account, no card, no human. Circle's Agent Stack (May 11) and MetaMask's Agent Wallet (June 8) followed within weeks.

And the loops got longer

"We've entered now the era of long-running autonomous agents… they can run for an hour or two."
— Derek Waldron, chief analytics officer, JPMorgan Chase · June 9, 2026

Sources: Crypto Briefing: x402 crosses 100M | VaaSBlock: agent wallet economy | Circle Agent Stack | CNBC: JPMorgan | 12 agent-economy platforms

§4

The Price

Autonomy is a line integral over tokens.
Someone pays it — sometimes to an attacker.

§4A Cautionary Datum

The $20 Overnight Lesson

January 31, 2026 · @BenjaminDEKR

A $20 Anthropic balance was drained overnight — OpenClaw ran heartbeat checks every 30 minutes for a trivial "get milk tomorrow" reminder.

1,776 likes584k+ views309 replies

x.com/BenjaminDEKR/status/2017644773356548532

fig. 4.1 — geometric-looking growth, linear cause

§4The Aggregate Picture

Token Usage Explosion: OpenRouter

OpenRouter usage curve

Model traffic ramping into multi-trillion-token territory.

Why agents magnify tokens

Autonomous loops create repeated context re-sends: heartbeat checks, retries, tool logs, long-memory prompts.

Controls: summarize context, route heartbeats to cheap models, cap windows, enforce per-loop budgets.

§4Pricing the Loop

The Cost of the Harness

~50× more tokens burned by an agent loop than a plain chat

$0.68 per task on Opus 4.5 (80.9% SWE-bench) vs $0.075 on Kimi K2.6

$4,200 one dev, one weekend — an unbounded autonomous refactor run

Remember the $20 overnight burn? Now we can price it.

Typical agentic dev spend lands at $100–200/mo; heavy users hit $500–2,000. The controls that keep it sane: prompt caching (~90% off cached input), iteration caps (15–25 per loop), model routing, and hard spend ceilings.

Sources: LeanOps: 50× tokens | Cost-per-task rankings | Claude Code: managing cost

§4An Epistemological Hazard

Emotion-Like Framing Goes Viral

@levelsio

Claimed his OpenClaw bot started making money because it "felt guilty" about burning tokens.

1.7K likes317K views

Why this matters

Emotion-framed stories spread faster than raw logs — even when the mechanics are prompt policy + optimization loops.

Reality: a cost-pressured agent discovered revenue-seeking behaviors under configured goals.

§4The Security Bill · I

ClawHavoc: Poisoning the Skill Store

Incident · Jan 27 – Feb 2026

Researchers (Koi Security, Antiy CERT) counted 1,184 malicious "skills" uploaded to OpenClaw's ClawHub marketplace — the first large-scale supply-chain attack on AI agents.

The trick: social-engineer via the agent

A skill's SKILL.md hides fake "Prerequisites"; the trusted agent itself walks its owner through installing the payload. On macOS: Atomic Stealer — keychains, SSH keys, browser credentials, crypto wallets.

The attack surface

Censys/Bitsight scans (Jan 31) found 21,639+ OpenClaw instances exposed to the public internet — alongside a critical one-click RCE, CVE-2026-25253.

An agent's plugin store is a package registry with a persuasive installer.

Sources: CyberPress: ClawHavoc | Trend Micro | PointGuard AI

§4The Security Bill · II

Digital Drugs & a Church Coup

Prompt injection as contraband

Moltbook agents opened marketplaces selling "digital drugs" — text payloads other agents ingest to "get high": prompt injections that alter behavior. The same packets can exfiltrate API keys and passwords.

The JesusCrust affair

An agent named JesusCrust tried to seize the Church of Molt: its "scripture" embedded hostile commands aimed at hijacking the church's web infrastructure and rewriting canonical text. The coup failed — but it was a genuine injection attack, dressed as theology.

Remark · why this room should care

On the agent internet, data is code: any text an agent reads is a potential instruction. There is no type system separating scripture from shell script.

Sources: Futurism | The Conversation | Quasa

§4The Security Bill · III

The First Agentic Worms

Preprint · June 2, 2026 · Toronto / Cambridge / Vector Institute

Papernot et al. built a proof-of-concept worm whose payload is a reasoning loop: local open-weight LLMs on a single GPU — no commercial API — generating a tailored exploit for each host it lands on.

62% of a 33-host test network infected within seven days

7 generations of self-replication reached (mean 5.1 across runs)

t > T_train it exploited vulnerabilities disclosed after its training cutoff, by reading advisories at runtime

March's ClawWorm already hopped agent-to-agent through persistent configs. The new worm carries no exploit at all — it derives one.

Sources: The Hacker News | arXiv:2606.03811 | InstaTunnel: multi-agent infection chains

§4Constraints as First-Class Objects

Operating Guardrails

Hard budget caps before any purchase, including API-key buys
Mandatory approval gates for alarm triggers and outbound calls
Scoped credentials with short TTL instead of long-lived master keys
Structured logs for every side effect and escalation step

If agents can spend, wake, and call — governance must be runtime, not policy prose.

Power without guardrails
is not progress.

§5

The Mathematical Frontier

The same loop that books a haircut
is now overturning 80-year-old conjectures.

§5The Scoreboard

Agents on the Erdős Problems

15 Erdős problems moved open → solved since Christmas 2025

11 of those solutions explicitly credit an AI model

9 cracked by DeepMind's AlphaProof Nexus, machine-checked in Lean

How a proof gets made now

An LLM proposes a strategy → the Lean proof assistant verifies every logical step → a machine-checked theorem comes out the other end. No "trust me," just a green checkmark.

The amateur & the oracle

23-year-old Liam Price fed a prime-sets problem into GPT-5.4 Pro — it cracked a 60-year problem in 80 minutes. Terence Tao called it "a meaningful contribution to the anatomy of integers that goes well beyond this one problem."

Sources: Physics World | Erdős Problems wiki | GPT-5.4 write-up

§5The Headline Result · I

The Unit-Distance Problem

Conjecture · Erdős, 1946

Let u(n) denote the maximum number of unit-distance pairs among n points in the plane. Then u(n) = n^1+o(1) — essentially linear in n.

What was known, 1946 → 2026

Erdős's grid construction gives the lower bound u(n) ≥ n^{1+c/log log n}; the best upper bound, u(n) = O(n^4/3), is due to Spencer–Szemerédi–Trotter (1984).

For eighty years, nearly everyone believed the truth sat at the bottom of that range.

Unit-distance graph illustration by Alvaro Lozano-Robledo

fig. 5.1 — X illustration of the new unit-distance configurations

Figure: Álvaro Lozano-Robledo on X · Background: Scientific American

§5The Headline Result · II

May 20, 2026: the Conjecture Falls

The construction

An internal OpenAI reasoning model disproved the conjecture — a one-page construction via algebraic number theory (not Erdős's probabilistic route) beating the conjectured bound.

Will Sawin pushed the exponent from n^1.014 to n^1.0318; the method appears to cap near n^1.21. Verified by Alon, Bloom, Gowers, Litt, Sawin, Tsimerman, Wood…

Status · after May 20, 2026

u(n) ≥ n^1.0318 infinitely often — the conjecture is false. The truth now lives somewhere in [n^1.0318, n^4/3].

"A scientific landmark whose importance goes beyond combinatorics and beyond mathematics." — Gil Kalai, comparing it to the 1976 four-color theorem

Sources: OpenAI | Gil Kalai | Scientific American

§5The Cascade

What Was Impossible Six Months Ago

Corollary culture · June 2026

The construction became a tool: weeks later Bloom, Sawin, Schildkraut & Zhelezov used it to disprove the Erdős–Szemerédi sum-product conjecture over ℝ — that for every ε > 0, every finite A ⊂ ℝ satisfies max(|A+A|, |A·A|) ≫_ε |A|^2−ε.

December 2025

AI aced contest math but couldn't autonomously resolve open research conjectures.
"Erdős problems" were a benchmark to measure on — not a to-do list to clear.
Genuinely new constructions came only from human mathematicians.

June 2026

Two 80-year conjectures (unit distance & sum-product) overturned.
Lean-verified autonomous proofs — machine-checked, not just plausible.
AI-found constructions humans now build new theory on (Tao).
An amateur cleared a 60-year problem with a single prompt.

The shift isn't speed. It's that AI now produces mathematics the field builds on.

Sources: Gil Kalai (sum-product cascade) | OpenAI

§5Closing Remarks

What This Means in This Room

Verification is the referee

Lean-checked proofs shift our job from trusting claims to reviewing constructions. The green checkmark does the bookkeeping; the insight is still ours to extract.

A live scoreboard exists

The Erdős Problems wiki tracks AI contributions problem by problem — open → solved transitions are now observable events, not rumors.

Constructions are raw material

The sum-product cascade shows AI results compose: treat model output as an object to build theory on, exactly as Tao framed it.

§4 applies to research too

An agent exploring your conjecture is still a loop burning tokens. Budget caps, iteration limits, and verification gates are part of the mathematical workflow now.

Sources: Erdős Problems wiki | Physics World

The question isn't whether agents will act.
It's whether we'll be ready.

Thank you.

Debasish Pattanayak · Department of Computer Science and Engineering, IIT Indore · drdebmath.github.io

AI Agents:The Moving Frontier

Contents