Joute Mode

The Arena

Two AIs battle on the same prompt. You judge blind, without knowing the brands: that is the playable arena, a sample of battles. The ranking is the aggregate of 139,935 real votes from the open compar:IA dataset, enriched in real time by Joute visitor votes.

Gemini8,7/1022 €/moisEssayer Gemini

Claude9/1018 €/moisEssayer Claude

139,935

real votes aggregated

295

playable battles

100%

blind vote

The AI arena, without brand bias.

When you choose your AI model, brand gets in the way. ChatGPT, Claude, Gemini: the names carry weight. The arena breaks that bias. You read two answers to the same prompt, without knowing who wrote them. You vote for the best. Then we reveal.

The format was not invented by Joute. It is the method of compar:IA from the French Ministry of Culture: since 2024, they have collected 139,935 real votes on hundreds of models in French, under Open License 2.0. Joute aggregates the entire dataset in its ranking, and lets you replay a sample of 295 battles to form your own opinion.

Your Joute votes count. They are aggregated server-side in real time and enrich the base ranking. The larger the Joute community grows, the more the ranking reflects real usage, not just the government reference dataset.

Duels 100% réels issus de compar:IA, l'arène d'IA du Ministère de la Culture. Joute ne simule rien.

CreativiteDuel n°1

Le défi

fais moi un poéme sur le temps pourris du pas-de-calais

ACombattant A

"Le Temps Pourri du Pas-de-Calais" Le ciel est lourd, comme un vieux drap mouillé, Pesant sur les toits, sur les champs noyés. Le vent siffle bas, un chant désolé, Et la pluie en pleurs ronge les pavés. Les falaises grises, usées par les ans, Se dressent encore, mais pour combien de temps ? Le sel de la mer, âpre et tenace, Mord les murs, les cœurs, les rêves las. Les corons noirs, sous la brume qui traîne, Gardent l’écho des coups de grisou. Le temps s’étire, lent, sans haine, Comme un mineur qui n’a plus de feu. Et nous, debout dans ce jour sans lumière, On attend l’éclair, le soleil qui perce. Mais le Pas-de-Calais, terre de misère, Nous offre en partage… son temps qui pourrit. - *Pour ceux qui savent que le vent du Nord* *Emporte les jours, mais jamais l’espoir.*

BCombattant B

Le ciel est lourd, couleur de cendres, Une brume épaisse étouffe les champs. Le vent se lève, cruel et moqueur, Il glace les os et mord le cœur. La pluie dessine sur les carreaux De longs serpents gris, lents et si faux. Elle tambourine sur les toits, Comme un spleen ancien, toujours là, sournois. Les pavés luisants, noirs et tristes, Reflettent l’heure qui persiste. Le soleil n’est qu’un souvenir, Une vieille image à l’avenir. Mais dans ce deuil de gris et d’eau, Sous ce manteau de froid halo, Reste une force, âpre et tenace, Celle du Nord, qui jamais ne passe. Car sous ce temps qui semble maudit, Le cœur du Nord jamais ne mentit. Il brave l’orage, le crachin rasoir, Et garde en lui un drôle d’espoir. Alors laissons gronder la tempête, Avec ses rafales de mal de tête. Ici, le mauvais temps est roi, Mais c’est notre chose, à nous, chez moi.

Rends ton verdict

Le classement

Le classement de l'arène

139 935 votes

Classement bâti sur les 139 935votes réels à l'aveugle du dataset ouvert compar:IA (Ministère de la Culture). Chaque marque agrège tous ses modèles testés, du plus petit au plus grand. Aucune note éditoriale n'entre en compte.

Gemini41%

20 744 / 50 177 duels 2

DeepSeek39%

6 367 / 16 518 duels 3

Claude35%

5 433 / 15 405 duels 4

Grok35%

1 636 / 4 677 duels 5

Le Chat34%

17 048 / 50 695 duels 6

ChatGPT31%

11 259 / 36 509 duels

Le pourcentage est un taux de victoire : la part de duels remportés sur l'ensemble des duels disputés par la marque dans le dataset compar:IA. Le nombre de duels varie d'une marque à l'autre.

How it works

Three steps, one minute per battle.

You read both answers

Same prompt, two AIs, identities hidden. You see A and B, not their names. No logo, no brand color. Just the text.

You vote for the best

A wins, B wins, tie, or both weak. No registration required, just a click. The vote is anonymous (IP+UA hash, no cookie).

We reveal, we aggregate

Names appear: you see whether your intuition matches. Your vote is added to the Joute ranking in real time.

Ranking methodology

How we build the arena ranking.

The model: Bradley-Terry, not a raw score

We do not add up wins. We use the Bradley-Terry statistical model, the standard for pair-wise rankings (Elo in chess, LMSYS Chatbot Arena). It computes a latent strength for each model, such that the probability that A beats B reflects the strength gap observed in past battles.

Two combined signals

compar:IA signal: the base ranking is sourced from the 139,935 real votes in the French Ministry of Culture dataset. This is the prior: a known strength for each model.

Joute signal: your votes and those of the Joute community are aggregated server-side (Vercel KV) and adjust the prior via Bayesian logic. The more votes accumulate, the more the Joute signal weighs vs the initial compar:IA ranking.

Data freshness

The compar:IA dataset is re-synced monthly (first Monday of the month). Joute votes are aggregated in real time: your vote changes the ranking the second you click.

See the full Joute method →

Where the battles come from

100% real confrontations

Everything comes from compar:IA, the open dataset of the French Ministry of Culture: 139,935 real votes cast blind by French-speaking users, under Open License 2.0. The ranking is its aggregate, enriched in real time by Joute votes. The playable arena gives you a sample of 295 battles from this same dataset to replay and judge yourself.

FAQ

Everything we get asked about the arena.

What is the Joute AI Arena?

A blind test between two AI models on the same prompt. You read both answers without knowing who wrote them, you vote for the best one, then we reveal the names. It is the only format that measures perceived quality without brand bias.

Where do the battles and votes come from?

Battles are drawn from the open dataset compar:IA, the AI arena of the French Ministry of Culture, under Open License 2.0. The current ranking aggregates 139,935 real votes cast by French-speaking users. Your Joute votes are added to this signal in real time.

How is the ranking calculated?

Two signals are combined. The compar:IA signal gives the base ranking (Bradley-Terry model on the 139,935 dataset votes). Joute votes are aggregated server-side and adjust this ranking via a Bayesian prior: the more votes accumulate, the more the Joute signal weighs vs the starting ranking.

Are my votes anonymous?

Yes. We only store a hash of IP + user-agent to limit spam (1 vote per battle per hash), no tracking cookie, no personal data. No account required, no email asked.

Why an arena rather than a classic benchmark?

Benchmarks (MMLU, GPQA, etc.) measure what models know how to answer in multiple-choice tests. The arena measures what you PREFER to read, blind, on real everyday prompts. It is complementary, and it is what best predicts usage satisfaction at 6 months.

How often is the ranking updated?

The compar:IA pool is re-synced monthly. Joute votes are aggregated in real time: you can refresh the ranking after your vote and your signal is already integrated.

What's next

The ranking evolves every week, don't miss it.

We send a monthly recap: who rises, who falls, and the models that collapse when you remove brand bias. No spam, one-click unsubscribe.

Subscribe to the monthly recap →Compare 2 AIs in detail