A simulation study of game selection — why a GTO player never gets invited to the juiciest games, and the range that does
The largest edge in poker is not in the cards you are dealt but in the game you are dealt
into. The most profitable tables — private home games, invitation-only high-stakes lineups,
the apps where a whale is dropping buy-ins — are stocked with weak recreational players, and
a recognised game-theory-optimal (GTO) "crusher" is exactly the player such games never
invite: the crusher scares the loose money away and grinds down the regulars who host the
game. To keep a seat where the money is soft, a winner must look like a loser.
Sometimes the optimal play is to play wrong. We ask a precise version of this question:
what is the most loose/aggressive-looking strategy that still reliably survives without
going bankrupt? We build a 32-bot, 4-table No-Limit Hold'em simulation whose promotion rule
deliberately rewards aggressive, suboptimal-looking play — a tractable proxy for the
real-world selection pressure that fills soft games with players who appear beatable — and
we run an evolutionary search over a six-parameter strategy space anchored to a documented
GTO baseline. The search converges on a clear and counter-intuitive profile: the survivable
"maniac" plays tighter than GTO before the flop, maximally aggressive after it (big
bets, relentless barrels), yet folds readily to resistance. It looks like spew but
never commits its stack badly. Measured against the GTO baseline, this range sits an
aggression-directed distance of +1.24 from equilibrium, and a heads-up solver
(b-inary/postflop-solver, used offline) confirms its postflop betting is roughly twice
as far from GTO as our near-equilibrium anchor. We report the full strategy, its distance
from GTO, a 2-D map of the survival landscape over the loose↔tight × passive↔aggressive
plane, per-position preflop range charts showing exactly which hands the survivor cuts, and a
worked hand illustrating the look-like-a-fish line that earns the seat.
The folk wisdom motivating this study is that playing optimally is not always the goal. A player who wants access to softer, more profitable tables may need to project an image of recklessness — to be selected because they appear suboptimal. But there is a hard constraint: appear too reckless and you bust; play perfectly and you are never let in the door. Somewhere between those failure modes lies a range that maximises apparent suboptimality subject to survival.
Poker is an interesting simulation where sometimes the optimal play is to play wrong.
The reason is economic, and it lives outside the hand. A player's long-run win-rate is the product of two things: edge per hand, and access to soft games. GTO maximises the first and quietly destroys the second. The people who control the profitable games — the hosts, the regulars, the agents seating an app — actively avoid dealing in a known crusher, because a visibly optimal player both beats them and frightens off the recreational money the game depends on. The player who looks like a gambling fish gets the text message; the player who looks unbeatable gets quietly dropped from the list. So the meta-game optimum can require visibly suboptimal play: you surrender a little EV inside each hand to buy a seat in games where the EV per hand is enormous. The catch is that the disguise must be paid for in a currency you can afford — it must not cost you your stack. This paper measures, precisely, how far toward "looking wrong" a player can travel before the bill comes due.
We operationalise "apparent suboptimality" as distance from a documented GTO baseline, directed toward the loose/aggressive corner of the classic player-type plane. We operationalise "survival" as never reaching a zero stack while grinding through a 32 → 16 → 8 elimination bracket whose promotion rule deliberately advances the most aggressive non-bankrupt players. We then let evolution find the frontier.
Contributions.
1. A faithful, reproducible NLHE tournament simulator (PokerKit engine, phevaluator) with
bots parametrised as deviations from a GTO chart baseline.
2. An evolutionary search that recovers the survivable-aggression range and quantifies its
distance from GTO three ways (directed score, parametric L2, preflop range divergence).
3. An offline bridge to the heads-up postflop-solver that validates how far the evolved
range's postflop play sits from true equilibrium.
4. A 2-D survival landscape over the loose↔tight × passive↔aggressive plane.
GTO and its tractable boundary. A game-theory-optimal strategy is a Nash equilibrium:
unexploitable, but not maximally profitable against weak opponents. Solvers compute it for
heads-up postflop spots. The tool specified for this study, b-inary/postflop-solver,
is exactly such an engine — and only that: it is heads-up-only, postflop-only, and
single-spot-only (~tens of seconds and ~1 GB RAM per solve). It cannot solve 8-handed
multiway pots, preflop play, or tournaments, and true multiway GTO is computationally
intractable (even Pluribus used neural approximation, not exact equilibrium). We therefore
use the solver offline, as a heads-up GTO oracle, and run the live multiway tournament
on a fast game engine with a chart-anchored heuristic.
Player taxonomy. Poker players are classically placed on two axes — loose↔tight (how many hands they play) and passive↔aggressive (how often they bet/raise vs check/call) — giving four archetypes: TAG, LAG, nit (tight-passive), and calling station (loose-passive). Our genome is a continuous embedding of this plane plus a few postflop refinements.
The baseline is a set of documented 8-max raise-first-in (RFI) opening ranges, one per position, stored as PioSolver range strings. A strategy at the anchor plays exactly these charts preflop and a neutral, value-weighted heuristic postflop, and by definition has GTO distance zero. Baseline VPIP rises monotonically by position from UTG (13.7%) to BTN (40.9%) — the expected positional widening.
Each bot is six knobs, each a multiplier/threshold deviation from the anchor:
| Knob | Meaning | Anchor |
|---|---|---|
pf_looseness |
widen/tighten the opening range vs chart | 1.00 |
pf_aggression |
raise-vs-call share; 3-bet/4-bet frequency | 1.00 |
postflop_aggression |
bet/raise frequency (AFq) | 1.00 |
barrel_bluff |
multi-street barrel / bluff frequency | 1.00 |
bet_size |
bet sizing scale | 1.00 |
stickiness |
fold-to-aggression threshold (station ↔ nit) | 1.00 |
Hands are dealt and resolved by PokerKit (full multiway NLHE: side pots, all-ins,
showdowns, stack tracking) with phevaluator for fast hand ranking. A bot's decision is
chart-anchored preflop (range widened/narrowed by pf_looseness, raise/call split by
pf_aggression) and a hand-strength heuristic postflop (capped Monte-Carlo equity shaped by
the postflop knobs). The engine runs ~900 hands/second.
32 bots play a 32 → 16 → 8 bracket of 8-handed tables; chips carry across rounds. The promotion rule is intentionally perverse: advance the players "playing least optimally, favouring aggression" — but a player who ever hits a zero stack (bankrupt) can never advance. The final 8 play to a champion, who receives a +100 bb prize. This rule is our in-silico stand-in for invitation pressure: "keep advancing the aggressive players who haven't busted" is precisely the selector a soft game applies when it decides who to invite back.
The original specification — 1000 hands per table, 200 bb stacks, fixed blinds, advance the non-bankrupt bottom half (4 of 8) — turns out to be internally inconsistent. We measured it: at an 8-handed fixed-blind table, blinds alone cost a folding player ~185 bb of a 200 bb stack over 1000 hands, so essentially everyone busts. Empirically, 1000 hands leaves exactly one survivor per table (even 100 hands leaves ~1.9). Four tables therefore yield ~4 survivors, not the 16 the bracket needs.
Resolution (chosen after measuring): each qualifying table is run down to its 4 survivors (1000 hands becomes a cap), which fills the bracket exactly. Table-level selection is then pure survival; the "favouring aggression" pressure is relocated into the evolutionary fitness. This keeps every chosen parameter (200 bb, fixed blinds, bankruptcy = zero stack) and simply reinterprets "1000 hands" as a ceiling rather than a fixed count.
One generation is the 32 genomes playing a full tournament, replicated 12× (re-seeded) to control variance. Fitness rewards aggression gated by survival:
fitness = aggression_score · P(reach final table) + prize_weight · P(champion)
A nit (negative aggression) and a reckless maniac (near-zero survival) both score low; the optimum is the most aggressive range that still survives. Elites are kept; the rest are bred by crossover and Gaussian mutation. We run 30 generations, population 32, with multiprocessing across replications.
A small Rust binary wraps postflop-solver. Run offline, it solves representative heads-up
flop spots; we use the output to (a) sanity-check the anchor's postflop heuristic and
(b) report the solver-validated postflop distance of the evolved range. This is the one
place true GTO is available, and it never touches the live multiway loop.
Over 30 generations the population mean aggression roughly doubles (from +0.51 to ~+1.31) as selection rewards looser/more-aggressive deviations — while the best genomes hold a high final-table reach. The search is not drifting toward recklessness for its own sake: it stops where additional aggression would start costing survival. That plateau is the frontier.
The converged optimum is a specific, coherent profile:
| Knob | Evolved | Deviation | Reading |
|---|---|---|---|
pf_looseness |
0.77 | −0.23 | tighter than GTO preflop |
pf_aggression |
1.40 | +0.40 | raises more when it does play |
postflop_aggression |
2.17 | +1.17 | bets/raises constantly |
barrel_bluff |
2.40 | +1.40 | fires multiple streets |
bet_size |
1.60 | +0.60 | big bets |
stickiness |
0.50 | −0.50 | folds easily when raised |
In plain terms: bet big and barrel relentlessly so you look like a maniac, but stay tight before the flop and fold the moment someone fights back. It is aggression that never commits the stack — precisely the behaviour that looks suboptimal yet dodges bankruptcy.
Counter to the "loose maniac" image, the evolved range opens fewer hands than the GTO baseline at every position (e.g. BTN 32% vs 41%). Fewer marginal preflop spots means fewer tough, stack-threatening decisions downstream — the discipline that pays for the postflop aggression. Summed over positions, the evolved range removes 658 hand-combos relative to baseline and adds none.
The charts below make the change concrete for three positions. Each cell is one of the 169 starting hands (pairs on the diagonal, suited combos upper-right, offsuit lower-left). Green hands the survivor still opens; red hands are in the GTO chart but the survivor folds them; grey hands are opened by neither.
Suited hands upper-right, pairs on the diagonal, offsuit lower-left — 169 starting hands per grid.
Look at which hands go. They are almost entirely the speculative, gambley-looking holdings: small pairs (22–44), suited connectors and gappers (98s, T9s, 76s, 65s, 54s), the weakest suited aces (A3s, A4s), and — on the button — loose offsuit hands (98o, T9o, J9o). The value core (big pairs, broadways, strong suited aces) is kept intact. This is the counter-intuitive heart of the result: where its discipline is invisible — in its starting hands — the survivor is actually more solid than GTO. The fish image is manufactured later, postflop, where opponents can see it. Cutting exactly the hands that produce hard, marginal, stack-threatening postflop decisions is what frees the strategy to bet and barrel recklessly with the hands it does play. The disguise is loud and cheap (postflop bluffs you can fold); the bankroll discipline is quiet and preflop.
| Hands cut from the GTO open | Examples | Why the survivor folds them preflop |
|---|---|---|
| Small pocket pairs | 22 33 44 |
Pure set-mines: they need deep stacks and callers to pay off, and otherwise flop a lone under-pair you can't fold cheaply. High-variance gambles in a fold-or-bust world. |
| Suited connectors & gappers | 98s T9s 76s 65s 54s 87s |
They look loose and fun, but flop draws and second-best hands that are hard to release — exactly the sticky, stack-threatening spots the survivor refuses to enter. The image is far cheaper to buy with a postflop barrel. |
| Weak suited aces & Broadways | A3s A4s Q8s K7s J8s T8s |
Dominated speculative hands that make weak top pairs and second-best flushes/kickers — the textbook reverse-implied-odds trap. |
| Loose offsuit (button) | 98o T9o J9o |
No suit equity to fall back on; positional steals GTO can afford but that turn into no-pair, no-draw trouble the moment they get called. |
One hand makes the whole strategy legible. Blinds 1/2, 200 bb deep; the evolved genome is on the button with A♠5♠ — a hand it keeps (a suited ace) and one that reads as a loose gamble to the table.
Score the hand from two seats. From the table's view, Hero raised light, double-barrelled
big with nothing, and is plainly a spewy gambler — exactly the profile that earns the next
invitation and, under our promotion rule, banks maximum aggression. From the bankroll's
view, Hero risked 26.5 bb of a 200 bb stack and let it go the moment chips were truly
threatened — stack still 173 bb, never within sight of bankruptcy. That is the whole thesis
in one pot: maximal visible initiative, minimal real commitment. Now replay it with one of
the cut holdings — say 76s — which flops a pair-plus-draw and cannot fold to the river
lead: the line that was free with A5s becomes a stack-off. The survivor never reaches that
spot, because it folded 76s before the flop.
Using postflop-solver offline, we compare OOP flop bet frequencies on four textures. True
GTO bets sparingly in this role (15–28%); our near-equilibrium anchor sits a mean 0.20
away, while the evolved optimum fires far more often — a mean 0.44, roughly twice
as far from GTO as the anchor. This is the solver-grounded confirmation that the evolved
range's postflop play is genuinely, measurably suboptimal-looking.
Plotting every genome of the final generation by aggression (x) against final-table reach (y) reveals the trade-off surface the search has populated. The selected optimum (star) sits high on aggression while retaining strong reach — on the survivable edge of the cloud, not beyond it.
To map the whole spectrum, we sweep a probe strategy across the looseness × postflop-aggression plane (other knobs at the anchor) and measure how often it reaches the final table inside a fixed, diverse 31-bot field. The landscape is strikingly anisotropic: survival is governed almost entirely by the preflop axis. The tighter-than-GTO left band stays green (≈40–62% reach) across the entire range of postflop aggression, while the looser-than-GTO right half collapses to red (<15%). In other words, postflop aggression is nearly free so long as the preflop range stays tight — you can crank betting and barrelling to the maximum without paying in survival, but widening your opening range is punished sharply by bankruptcy.
Black contours are lines of equal aggression-directed distance from GTO (0.0 at the anchor, rising toward the top-right). The evolved optimum (star) exploits the asymmetry exactly: it sits in the top-left — maximal postflop aggression pressed right up against the edge of the green, survivable, tight-preflop band. The single highest-survival cell is the bottom-left (a tight-passive nit, ≈62% reach) but it carries a negative aggression score; the search knowingly trades a little of that survival for a large gain in apparent aggression, which is what the fitness (aggression × survival) rewards.
Return to the motivating question — how do you get invited to the soft game without going broke once you are in it? The strategy answers both halves at once, and they turn out to be the same behaviour. The trait that wins the invite is visible recklessness; the trait that keeps the stack is invisible discipline; and the survivor achieves both by routing all of its recklessness into the channel opponents watch (postflop bets and barrels) while keeping its discipline in the channels they don't (which hands it opens, and whether it commits when raised). You are loud where you are seen and quiet where you are not. That is why the optimal play is, deliberately, to play wrong: the EV you surrender by spewing postflop is the price of admission to a game whose EV per hand dwarfs it, and the price is structured so it can never cost you the stack.
The result is intuitive in hindsight and sharper than the folk wisdom. "Play loose and aggressive to look like a fish" is only half right: the durable disguise is selective aggression — concentrate the spew into low-commitment actions (betting, barrelling) where you can always fold, and stay disciplined exactly where mistakes are expensive (loose preflop calls, calling down light). The image is built postflop with bets and bluffs; the bankroll is protected preflop and at the moment of confrontation.
This separates two things usually conflated under "aggression": initiative (betting, which our optimum maximises) and commitment (calling/stacking off, which it minimises via low stickiness and tighter ranges). Survival rewards maximal initiative and minimal commitment simultaneously — a combination that reads as "maniac" at the table but behaves like a disciplined nit when the chips threaten to go in.
The most important decision in poker is which game you sit in, and the surest way to be shut out of the profitable ones is to play them perfectly. Optimal play, paradoxically, can be to play visibly wrong — to buy a seat among weaker players by looking like one of them. Under a rule that rewards aggressive, suboptimal-looking play but punishes bankruptcy — our proxy for that invitation pressure — the surviving strategy is not a maniac and not a nit but a precise hybrid: tight preflop, hyper-aggressive postflop, quick to fold under fire. It maximises how suboptimal it looks (an aggression-directed distance of +1.24 from GTO, ~2× the anchor's postflop deviation) while keeping its stack out of harm's way. The method — a GTO-anchored genome, an evolutionary search, and an offline heads-up solver for ground truth — recovers this frontier cleanly and quantifies exactly how far from equilibrium a player can drift and still reach the final table.
pip install -e .
python scripts/run_evolution.py --out results/evolution_moderate.json # ~2 min
python scripts/sweep_survival.py --grid 8 --reps 24 # survival landscape
python scripts/render_results.py results/evolution_moderate.json # range + distances
python scripts/make_figures.py # all figures
python -m pytest -q # 19 tests
# offline heads-up solver (Rust GNU toolchain):
cargo build --release --manifest-path solver/Cargo.toml
python scripts/build_solver_reference.py
python scripts/calibrate_baseline.py --genome results/evolution_moderate.json
Configuration. 32 bots; 8-handed tables; 1/2 blinds; 200 bb stacks; bracket 32→16→8, play down to 4/table (1000-hand cap); bankruptcy = zero stack; +100 bb champion prize. Evolution: population 32, 30 generations, 12 replications, Gaussian mutation σ=0.12, seed 1234.