Joute
Joute Mode

The Arena

Two AIs battle on the same prompt. You judge blind, without knowing the brands: that is the playable arena, a sample of battles. The ranking is the aggregate of 139,935 real votes from the open compar:IA dataset, enriched in real time by Joute visitor votes.

Logo GeminiGemini8,7/1022 €/moisEssayer Gemini
Logo ClaudeClaude9/1018 €/moisEssayer Claude
VS
vs
139,935
real votes aggregated
295
playable battles
100%
blind vote

The AI arena, without brand bias.

When you choose your AI model, brand gets in the way. ChatGPT, Claude, Gemini: the names carry weight. The arena breaks that bias. You read two answers to the same prompt, without knowing who wrote them. You vote for the best. Then we reveal.

The format was not invented by Joute. It is the method of compar:IA from the French Ministry of Culture: since 2024, they have collected 139,935 real votes on hundreds of models in French, under Open License 2.0. Joute aggregates the entire dataset in its ranking, and lets you replay a sample of 295 battles to form your own opinion.

Your Joute votes count. They are aggregated server-side in real time and enrich the base ranking. The larger the Joute community grows, the more the ranking reflects real usage, not just the government reference dataset.

Duels 100% réels issus de compar:IA, l'arène d'IA du Ministère de la Culture. Joute ne simule rien.
CreativiteDuel n°1
Le défi

ecris moi un poeme sur la guerre dans le style de rimbaud en sonnet

ACombattant A

Sonnet de la Guerre Je marche ardent, les pieds lacérés de glaives, Le ciel s’étire en cris, un sanglot infini. Les arbres décharnés, fantômes de l’oubli, Se tordent sous le vent, palpitants comme un rêve. L’horizon, noir fardeau, sur mes épaules lève Sa masse de métal, de feu, de désespoir. Chaque pas engendre un effroi, un non-espoir, Et l’aube se consume en une brève trêve. Les canons hurlent leurs chants de mort et de nuit, Tandis qu’au fond des cœurs, un murmure s’enfuit : "L’amour est un leurre, la paix un vain mensonge." Mais dans l’ombre, j’entends une plainte ancienne, Un cri qui dit : "Renaître, renaître pour les songes, Et briser à jamais la chaîne inhumaine." - *À la manière de Rimbaud*

BCombattant B

Dans les champs de sang et de désolation, L'homme, ce lion, se transforme en vautour. Les canons rugissent, la terre se soulève, Et les âmes s'envolent, comme des oiseaux. Les villages brûlés, les forêts dévastées, Les cris des enfants, les pleurs des mères, La guerre, ce monstre, dévore les âmes, Et laisse derrière elle un silence de mort. Les soldats, ces héros, deviennent des ombres, Dans la nuit des combats, ils cherchent leur chemin. La guerre, ce fléau, ne connaît pas de trêve, Et le monde entier tremble sous son joug. Oh, guerre cruelle, pourquoi tant de souffrance ? Pourquoi tant de douleur, tant de désespoir ?

Rends ton verdict
Le classement

Le classement de l'arène

139 935 votes

Classement bâti sur les 139 935votes réels à l'aveugle du dataset ouvert compar:IA (Ministère de la Culture). Chaque marque agrège tous ses modèles testés, du plus petit au plus grand. Aucune note éditoriale n'entre en compte.

Le pourcentage est un taux de victoire : la part de duels remportés sur l'ensemble des duels disputés par la marque dans le dataset compar:IA. Le nombre de duels varie d'une marque à l'autre.

How it works

Three steps, one minute per battle.

1

You read both answers

Same prompt, two AIs, identities hidden. You see A and B, not their names. No logo, no brand color. Just the text.

2

You vote for the best

A wins, B wins, tie, or both weak. No registration required, just a click. The vote is anonymous (IP+UA hash, no cookie).

3

We reveal, we aggregate

Names appear: you see whether your intuition matches. Your vote is added to the Joute ranking in real time.

Ranking methodology

How we build the arena ranking.

The model: Bradley-Terry, not a raw score

We do not add up wins. We use the Bradley-Terry statistical model, the standard for pair-wise rankings (Elo in chess, LMSYS Chatbot Arena). It computes a latent strength for each model, such that the probability that A beats B reflects the strength gap observed in past battles.

Two combined signals

compar:IA signal: the base ranking is sourced from the 139,935 real votes in the French Ministry of Culture dataset. This is the prior: a known strength for each model.

Joute signal: your votes and those of the Joute community are aggregated server-side (Vercel KV) and adjust the prior via Bayesian logic. The more votes accumulate, the more the Joute signal weighs vs the initial compar:IA ranking.

Data freshness

The compar:IA dataset is re-synced monthly (first Monday of the month). Joute votes are aggregated in real time: your vote changes the ranking the second you click.

See the full Joute method →
Where the battles come from

100% real confrontations

Everything comes from compar:IA, the open dataset of the French Ministry of Culture: 139,935 real votes cast blind by French-speaking users, under Open License 2.0. The ranking is its aggregate, enriched in real time by Joute votes. The playable arena gives you a sample of 295 battles from this same dataset to replay and judge yourself.

FAQ

Everything we get asked about the arena.

What is the Joute AI Arena?

+
A blind test between two AI models on the same prompt. You read both answers without knowing who wrote them, you vote for the best one, then we reveal the names. It is the only format that measures perceived quality without brand bias.

Where do the battles and votes come from?

+
Battles are drawn from the open dataset compar:IA, the AI arena of the French Ministry of Culture, under Open License 2.0. The current ranking aggregates 139,935 real votes cast by French-speaking users. Your Joute votes are added to this signal in real time.

How is the ranking calculated?

+
Two signals are combined. The compar:IA signal gives the base ranking (Bradley-Terry model on the 139,935 dataset votes). Joute votes are aggregated server-side and adjust this ranking via a Bayesian prior: the more votes accumulate, the more the Joute signal weighs vs the starting ranking.

Are my votes anonymous?

+
Yes. We only store a hash of IP + user-agent to limit spam (1 vote per battle per hash), no tracking cookie, no personal data. No account required, no email asked.

Why an arena rather than a classic benchmark?

+
Benchmarks (MMLU, GPQA, etc.) measure what models know how to answer in multiple-choice tests. The arena measures what you PREFER to read, blind, on real everyday prompts. It is complementary, and it is what best predicts usage satisfaction at 6 months.

How often is the ranking updated?

+
The compar:IA pool is re-synced monthly. Joute votes are aggregated in real time: you can refresh the ranking after your vote and your signal is already integrated.
What's next

The ranking evolves every week, don't miss it.

We send a monthly recap: who rises, who falls, and the models that collapse when you remove brand bias. No spam, one-click unsubscribe.