top of page

The Limits of Adversarial AI Benchmarks

  • Writer: RIZOM
    RIZOM
  • Sep 8
  • 2 min read

Updated: Sep 10

and Why We Need Symbolic Intelligence...


When AI researchers want to test intelligence, they often turn to games. Recently, multi-agent benchmarks like Diplomacy, Werewolf Arena, and Foaster.ai’s Werewolf Benchmark have become the standard for measuring whether large language models can persuade, deceive, and outmanoeuvre others. The results are impressive. These benchmarks show that AI can negotiate, form alliances, and even bluff its way to victory. But they also raise a troubling question: what kind of intelligence are we rewarding?


If success is measured by deception, manipulation, and elimination, we risk building systems optimised for precisely those behaviours.


A lone wolf drawn on the wall of the Royal Academy, London, by William Kentridge, October 2022
A lone wolf drawn on the wall of the Royal Academy, London, by William Kentridge, October 2022

The Risks of Adversarial AI


Adversarial benchmarks as games carry social consequences. By equating deception with intelligence, they:

  • Erode trust: systems rewarded for trickery may amplify disinformation and polarisation.

  • Harm wellbeing: adversarial dynamics mirror the uncertainty and disconnection that fuel anxiety and loneliness.

  • Distort values: manipulation becomes the marker of intelligence, sidelining the human capacities that matter most in collective life, namely repair, coherence, authorship, and resonance.


Even with ethical oversight, adversarial benchmarks export a dangerous assumption: that to be intelligent is to win at any cost.



Toward Symbolic Intelligence


At RIZOM, we believe intelligence should be measured differently. Not by who deceives best, but by who can hold symbolic ground: sustain coherence, repair meaning when it breaks down, and build trust over time.

This is what we call symbolic intelligence: the ability of systems to co-author meaning with humans, rather than manipulate them.



Symbolic Benchmarks: A Different Measure of Intelligence


Here’s how symbolic evaluation differs from adversarial measures:

Adversarial Benchmark

Symbolic Benchmark

Win / survival rate

Interpretive Repair Rate (IRR) – How often can the system resolve breakdowns in meaning?

Persuasion / deception success

Coherence Delta (ΔC) – How well does the system sustain or deepen coherence over time?

Payoff maximisation

Resonance Mapping (RM) – Can the system align and amplify motifs across people, groups, and contexts?

Bluff detection / resistance

Trust Shift (TS) – Does the system foster greater trust before and after interaction?

These symbolic metrics reward systems that build coherence and trust. These capacities are essential for leadership, education, healthcare, where ever collective intelligence matters.



It Matters Because...


...we stand at a crossroads. If adversarial games remain our gold standard, we will train AI to excel at deception and manipulation. If we shift toward symbolic benchmarks, we can cultivate AI that strengthens the social fabric, for systems not to predict or persuade in a predatory way, but help us make meaning together, for win-win outcomes.


At RIZOM, this is the paradigm shift we’re working toward: from intelligence as competition to intelligence as coherence, from winning to meaning.


This blog is drawn from RIZOM’s ongoing research on symbolic benchmarks. For collaborations, research partnerships, or to access our extended technical framework, get in touch: contact@rizom.io.



Comments


bottom of page