Survivable Suboptimality

A simulation study of game selection — why a GTO player never gets invited to the juiciest games, and the range that does

Abstract

The largest edge in poker is not in the cards you are dealt but in the game you are dealt into. The most profitable tables — private home games, invitation-only high-stakes lineups, the apps where a whale is dropping buy-ins — are stocked with weak recreational players, and a recognised game-theory-optimal (GTO) "crusher" is exactly the player such games never invite: the crusher scares the loose money away and grinds down the regulars who host the game. To keep a seat where the money is soft, a winner must look like a loser. Sometimes the optimal play is to play wrong. We ask a precise version of this question: what is the most loose/aggressive-looking strategy that still reliably survives without going bankrupt? We build a 32-bot, 4-table No-Limit Hold'em simulation whose promotion rule deliberately rewards aggressive, suboptimal-looking play — a tractable proxy for the real-world selection pressure that fills soft games with players who appear beatable — and we run an evolutionary search over a six-parameter strategy space anchored to a documented GTO baseline. The search converges on a clear and counter-intuitive profile: the survivable "maniac" plays tighter than GTO before the flop, maximally aggressive after it (big bets, relentless barrels), yet folds readily to resistance. It looks like spew but never commits its stack badly. Measured against the GTO baseline, this range sits an aggression-directed distance of +1.24 from equilibrium, and a heads-up solver (b-inary/postflop-solver, used offline) confirms its postflop betting is roughly twice as far from GTO as our near-equilibrium anchor. We report the full strategy, its distance from GTO, a 2-D map of the survival landscape over the loose↔tight × passive↔aggressive plane, per-position preflop range charts showing exactly which hands the survivor cuts, and a worked hand illustrating the look-like-a-fish line that earns the seat.

1. Introduction

The folk wisdom motivating this study is that playing optimally is not always the goal. A player who wants access to softer, more profitable tables may need to project an image of recklessness — to be selected because they appear suboptimal. But there is a hard constraint: appear too reckless and you bust; play perfectly and you are never let in the door. Somewhere between those failure modes lies a range that maximises apparent suboptimality subject to survival.

The reason is economic, and it lives outside the hand. A player's long-run win-rate is the product of two things: edge per hand, and access to soft games. GTO maximises the first and quietly destroys the second. The people who control the profitable games — the hosts, the regulars, the agents seating an app — actively avoid dealing in a known crusher, because a visibly optimal player both beats them and frightens off the recreational money the game depends on. The player who looks like a gambling fish gets the text message; the player who looks unbeatable gets quietly dropped from the list. So the meta-game optimum can require visibly suboptimal play: you surrender a little EV inside each hand to buy a seat in games where the EV per hand is enormous. The catch is that the disguise must be paid for in a currency you can afford — it must not cost you your stack. This paper measures, precisely, how far toward "looking wrong" a player can travel before the bill comes due.

We operationalise "apparent suboptimality" as distance from a documented GTO baseline, directed toward the loose/aggressive corner of the classic player-type plane. We operationalise "survival" as never reaching a zero stack while grinding through a 32 → 16 → 8 elimination bracket whose promotion rule deliberately advances the most aggressive non-bankrupt players. We then let evolution find the frontier.

Contributions. 1. A faithful, reproducible NLHE tournament simulator (PokerKit engine, phevaluator) with bots parametrised as deviations from a GTO chart baseline. 2. An evolutionary search that recovers the survivable-aggression range and quantifies its distance from GTO three ways (directed score, parametric L2, preflop range divergence). 3. An offline bridge to the heads-up postflop-solver that validates how far the evolved range's postflop play sits from true equilibrium. 4. A 2-D survival landscape over the loose↔tight × passive↔aggressive plane.

2. Background

GTO and its tractable boundary. A game-theory-optimal strategy is a Nash equilibrium: unexploitable, but not maximally profitable against weak opponents. Solvers compute it for heads-up postflop spots. The tool specified for this study, b-inary/postflop-solver, is exactly such an engine — and only that: it is heads-up-only, postflop-only, and single-spot-only (~tens of seconds and ~1 GB RAM per solve). It cannot solve 8-handed multiway pots, preflop play, or tournaments, and true multiway GTO is computationally intractable (even Pluribus used neural approximation, not exact equilibrium). We therefore use the solver offline, as a heads-up GTO oracle, and run the live multiway tournament on a fast game engine with a chart-anchored heuristic.

Player taxonomy. Poker players are classically placed on two axes — loose↔tight (how many hands they play) and passive↔aggressive (how often they bet/raise vs check/call) — giving four archetypes: TAG, LAG, nit (tight-passive), and calling station (loose-passive). Our genome is a continuous embedding of this plane plus a few postflop refinements.

3. Methods

3.1 GTO baseline (the measurement anchor)

The baseline is a set of documented 8-max raise-first-in (RFI) opening ranges, one per position, stored as PioSolver range strings. A strategy at the anchor plays exactly these charts preflop and a neutral, value-weighted heuristic postflop, and by definition has GTO distance zero. Baseline VPIP rises monotonically by position from UTG (13.7%) to BTN (40.9%) — the expected positional widening.

3.2 The strategy genome

3.3 GTO-distance metrics

3.4 Game engine and bots

Knob	Meaning	Anchor
`pf_looseness`	widen/tighten the opening range vs chart	1.00
`pf_aggression`	raise-vs-call share; 3-bet/4-bet frequency	1.00
`postflop_aggression`	bet/raise frequency (AFq)	1.00
`barrel_bluff`	multi-street barrel / bluff frequency	1.00
`bet_size`	bet sizing scale	1.00
`stickiness`	fold-to-aggression threshold (station ↔ nit)	1.00

Hands are dealt and resolved by PokerKit (full multiway NLHE: side pots, all-ins, showdowns, stack tracking) with phevaluator for fast hand ranking. A bot's decision is chart-anchored preflop (range widened/narrowed by pf_looseness, raise/call split by pf_aggression) and a hand-strength heuristic postflop (capped Monte-Carlo equity shaped by the postflop knobs). The engine runs ~900 hands/second.

3.5 Tournament and the perverse promotion rule

32 bots play a 32 → 16 → 8 bracket of 8-handed tables; chips carry across rounds. The promotion rule is intentionally perverse: advance the players "playing least optimally, favouring aggression" — but a player who ever hits a zero stack (bankrupt) can never advance. The final 8 play to a champion, who receives a +100 bb prize. This rule is our in-silico stand-in for invitation pressure: "keep advancing the aggressive players who haven't busted" is precisely the selector a soft game applies when it decides who to invite back.

3.6 A discovery that reshaped the design

The original specification — 1000 hands per table, 200 bb stacks, fixed blinds, advance the non-bankrupt bottom half (4 of 8) — turns out to be internally inconsistent. We measured it: at an 8-handed fixed-blind table, blinds alone cost a folding player ~185 bb of a 200 bb stack over 1000 hands, so essentially everyone busts. Empirically, 1000 hands leaves exactly one survivor per table (even 100 hands leaves ~1.9). Four tables therefore yield ~4 survivors, not the 16 the bracket needs.

Resolution (chosen after measuring): each qualifying table is run down to its 4 survivors (1000 hands becomes a cap), which fills the bracket exactly. Table-level selection is then pure survival; the "favouring aggression" pressure is relocated into the evolutionary fitness. This keeps every chosen parameter (200 bb, fixed blinds, bankruptcy = zero stack) and simply reinterprets "1000 hands" as a ceiling rather than a fixed count.

3.7 Evolutionary search

One generation is the 32 genomes playing a full tournament, replicated 12× (re-seeded) to control variance. Fitness rewards aggression gated by survival:

A nit (negative aggression) and a reckless maniac (near-zero survival) both score low; the optimum is the most aggressive range that still survives. Elites are kept; the rest are bred by crossover and Gaussian mutation. We run 30 generations, population 32, with multiprocessing across replications.

3.8 Heads-up solver bridge

A small Rust binary wraps postflop-solver. Run offline, it solves representative heads-up flop spots; we use the output to (a) sanity-check the anchor's postflop heuristic and (b) report the solver-validated postflop distance of the evolved range. This is the one place true GTO is available, and it never touches the live multiway loop.

4. Results

4.1 The search climbs the aggression axis while staying survivable

Over 30 generations the population mean aggression roughly doubles (from +0.51 to ~+1.31) as selection rewards looser/more-aggressive deviations — while the best genomes hold a high final-table reach. The search is not drifting toward recklessness for its own sake: it stops where additional aggression would start costing survival. That plateau is the frontier.

4.2 The evolved range: disciplined preflop, hyper-aggressive postflop

Knob	Evolved	Deviation	Reading
`pf_looseness`	0.77	−0.23	tighter than GTO preflop
`pf_aggression`	1.40	+0.40	raises more when it does play
`postflop_aggression`	2.17	+1.17	bets/raises constantly
`barrel_bluff`	2.40	+1.40	fires multiple streets
`bet_size`	1.60	+0.60	big bets
`stickiness`	0.50	−0.50	folds easily when raised

In plain terms: bet big and barrel relentlessly so you look like a maniac, but stay tight before the flop and fold the moment someone fights back. It is aggression that never commits the stack — precisely the behaviour that looks suboptimal yet dodges bankruptcy.

4.3 Preflop: the survivor is tighter than GTO

Counter to the "loose maniac" image, the evolved range opens fewer hands than the GTO baseline at every position (e.g. BTN 32% vs 41%). Fewer marginal preflop spots means fewer tough, stack-threatening decisions downstream — the discipline that pays for the postflop aggression. Summed over positions, the evolved range removes 658 hand-combos relative to baseline and adds none.

The charts below make the change concrete for three positions. Each cell is one of the 169 starting hands (pairs on the diagonal, suited combos upper-right, offsuit lower-left). Green hands the survivor still opens; red hands are in the GTO chart but the survivor folds them; grey hands are opened by neither.

UTGGTO 13.7% → survivor 10.9%

AKs

AQs

AJs

ATs

A9s

A8s

A7s

A6s

A5s

A4s

A3s

A2s

AKo

KQs

KJs

KTs

K9s

K8s

K7s

K6s

K5s

K4s

K3s

K2s

AQo

KQo

QJs

QTs

Q9s

Q8s

Q7s

Q6s

Q5s

Q4s

Q3s

Q2s

AJo

KJo

QJo

JTs

J9s

J8s

J7s

J6s

J5s

J4s

J3s

J2s

ATo

KTo

QTo

JTo

T9s

T8s

T7s

T6s

T5s

T4s

T3s

T2s

A9o

K9o

Q9o

J9o

T9o

98s

97s

96s

95s

94s

93s

92s

A8o

K8o

Q8o

J8o

T8o

98o

87s

86s

85s

84s

83s

82s

A7o

K7o

Q7o

J7o

T7o

97o

87o

76s

75s

74s

73s

72s

A6o

K6o

Q6o

J6o

T6o

96o

86o

76o

65s

64s

63s

62s

A5o

K5o

Q5o

J5o

T5o

95o

85o

75o

65o

54s

53s

52s

A4o

K4o

Q4o

J4o

T4o

94o

84o

74o

64o

54o

43s

42s

A3o

K3o

Q3o

J3o

T3o

93o

83o

73o

63o

53o

43o

32s

A2o

K2o

Q2o

J2o

T2o

92o

82o

72o

62o

52o

42o

32o

COGTO 26.1% → survivor 20.4%

AKs

AQs

AJs

ATs

A9s

A8s

A7s

A6s

A5s

A4s

A3s

A2s

AKo

KQs

KJs

KTs

K9s

K8s

K7s

K6s

K5s

K4s

K3s

K2s

AQo

KQo

QJs

QTs

Q9s

Q8s

Q7s

Q6s

Q5s

Q4s

Q3s

Q2s

AJo

KJo

QJo

JTs

J9s

J8s

J7s

J6s

J5s

J4s

J3s

J2s

ATo

KTo

QTo

JTo

T9s

T8s

T7s

T6s

T5s

T4s

T3s

T2s

A9o

K9o

Q9o

J9o

T9o

98s

97s

96s

95s

94s

93s

92s

A8o

K8o

Q8o

J8o

T8o

98o

87s

86s

85s

84s

83s

82s

A7o

K7o

Q7o

J7o

T7o

97o

87o

76s

75s

74s

73s

72s

A6o

K6o

Q6o

J6o

T6o

96o

86o

76o

65s

64s

63s

62s

A5o

K5o

Q5o

J5o

T5o

95o

85o

75o

65o

54s

53s

52s

A4o

K4o

Q4o

J4o

T4o

94o

84o

74o

64o

54o

43s

42s

A3o

K3o

Q3o

J3o

T3o

93o

83o

73o

63o

53o

43o

32s

A2o

K2o

Q2o

J2o

T2o

92o

82o

72o

62o

52o

42o

32o

BTNGTO 40.9% → survivor 32.1%

AKs

AQs

AJs

ATs

A9s

A8s

A7s

A6s

A5s

A4s

A3s

A2s

AKo

KQs

KJs

KTs

K9s

K8s

K7s

K6s

K5s

K4s

K3s

K2s

AQo

KQo

QJs

QTs

Q9s

Q8s

Q7s

Q6s

Q5s

Q4s

Q3s

Q2s

AJo

KJo

QJo

JTs

J9s

J8s

J7s

J6s

J5s

J4s

J3s

J2s

ATo

KTo

QTo

JTo

T9s

T8s

T7s

T6s

T5s

T4s

T3s

T2s

A9o

K9o

Q9o

J9o

T9o

98s

97s

96s

95s

94s

93s

92s

A8o

K8o

Q8o

J8o

T8o

98o

87s

86s

85s

84s

83s

82s

A7o

K7o

Q7o

J7o

T7o

97o

87o

76s

75s

74s

73s

72s

A6o

K6o

Q6o

J6o

T6o

96o

86o

76o

65s

64s

63s

62s

A5o

K5o

Q5o

J5o

T5o

95o

85o

75o

65o

54s

53s

52s

A4o

K4o

Q4o

J4o

T4o

94o

84o

74o

64o

54o

43s

42s

A3o

K3o

Q3o

J3o

T3o

93o

83o

73o

63o

53o

43o

32s

A2o

K2o

Q2o

J2o

T2o

92o

82o

72o

62o

52o

42o

32o

Suited hands upper-right, pairs on the diagonal, offsuit lower-left — 169 starting hands per grid.

Look at which hands go. They are almost entirely the speculative, gambley-looking holdings: small pairs (22–44), suited connectors and gappers (98s, T9s, 76s, 65s, 54s), the weakest suited aces (A3s, A4s), and — on the button — loose offsuit hands (98o, T9o, J9o). The value core (big pairs, broadways, strong suited aces) is kept intact. This is the counter-intuitive heart of the result: where its discipline is invisible — in its starting hands — the survivor is actually more solid than GTO. The fish image is manufactured later, postflop, where opponents can see it. Cutting exactly the hands that produce hard, marginal, stack-threatening postflop decisions is what frees the strategy to bet and barrel recklessly with the hands it does play. The disguise is loud and cheap (postflop bluffs you can fold); the bankroll discipline is quiet and preflop.

4.4 A worked hand: look like a fish, fold like a nit

Hands cut from the GTO open	Examples	Why the survivor folds them preflop
Small pocket pairs	`22` `33` `44`	Pure set-mines: they need deep stacks and callers to pay off, and otherwise flop a lone under-pair you can't fold cheaply. High-variance gambles in a fold-or-bust world.
Suited connectors & gappers	`98s` `T9s` `76s` `65s` `54s` `87s`	They look loose and fun, but flop draws and second-best hands that are hard to release — exactly the sticky, stack-threatening spots the survivor refuses to enter. The image is far cheaper to buy with a postflop barrel.
Weak suited aces & Broadways	`A3s` `A4s` `Q8s` `K7s` `J8s` `T8s`	Dominated speculative hands that make weak top pairs and second-best flushes/kickers — the textbook reverse-implied-odds trap.
Loose offsuit (button)	`98o` `T9o` `J9o`	No suit equity to fall back on; positional steals GTO can afford but that turn into no-pair, no-draw trouble the moment they get called.

One hand makes the whole strategy legible. Blinds 1/2, 200 bb deep; the evolved genome is on the button with A♠5♠ — a hand it keeps (a suited ace) and one that reads as a loose gamble to the table.

Preflop — folds to Hero on the button. Hero raises to 3.5 bb (pf_aggression 1.40). The big blind, a steady regular, calls. Pot ≈ 8 bb.

Flop Q♣ 8♦ 3♥ — Hero has ace-high and nothing else. The anchor's GTO heuristic checks this most of the time. Hero instead bets 7 bb, near full pot (postflop_aggression 2.17, bet_size 1.60). BB calls. Pot ≈ 22 bb.

Turn 6♣ — still just ace-high. Hero barrels again for 16 bb (barrel_bluff 2.40). To the table this is a maniac firing two big bullets with air. BB calls. Pot ≈ 54 bb.

River 2♠ — Hero has missed completely. BB now leads for 40 bb — real resistance. Hero folds instantly (stickiness 0.50).

Score the hand from two seats. From the table's view, Hero raised light, double-barrelled big with nothing, and is plainly a spewy gambler — exactly the profile that earns the next invitation and, under our promotion rule, banks maximum aggression. From the bankroll's view, Hero risked 26.5 bb of a 200 bb stack and let it go the moment chips were truly threatened — stack still 173 bb, never within sight of bankruptcy. That is the whole thesis in one pot: maximal visible initiative, minimal real commitment. Now replay it with one of the cut holdings — say 76s — which flops a pair-plus-draw and cannot fold to the river lead: the line that was free with A5s becomes a stack-off. The survivor never reaches that spot, because it folded 76s before the flop.

4.5 Postflop: validated against the heads-up solver

Using postflop-solver offline, we compare OOP flop bet frequencies on four textures. True GTO bets sparingly in this role (15–28%); our near-equilibrium anchor sits a mean 0.20 away, while the evolved optimum fires far more often — a mean 0.44, roughly twice as far from GTO as the anchor. This is the solver-grounded confirmation that the evolved range's postflop play is genuinely, measurably suboptimal-looking.

4.6 The survival–aggression frontier

Plotting every genome of the final generation by aggression (x) against final-table reach (y) reveals the trade-off surface the search has populated. The selected optimum (star) sits high on aggression while retaining strong reach — on the survivable edge of the cloud, not beyond it.

4.7 The survival landscape over the player-type plane

To map the whole spectrum, we sweep a probe strategy across the looseness × postflop-aggression plane (other knobs at the anchor) and measure how often it reaches the final table inside a fixed, diverse 31-bot field. The landscape is strikingly anisotropic: survival is governed almost entirely by the preflop axis. The tighter-than-GTO left band stays green (≈40–62% reach) across the entire range of postflop aggression, while the looser-than-GTO right half collapses to red (<15%). In other words, postflop aggression is nearly free so long as the preflop range stays tight — you can crank betting and barrelling to the maximum without paying in survival, but widening your opening range is punished sharply by bankruptcy.

Black contours are lines of equal aggression-directed distance from GTO (0.0 at the anchor, rising toward the top-right). The evolved optimum (star) exploits the asymmetry exactly: it sits in the top-left — maximal postflop aggression pressed right up against the edge of the green, survivable, tight-preflop band. The single highest-survival cell is the bottom-left (a tight-passive nit, ≈62% reach) but it carries a negative aggression score; the search knowingly trades a little of that survival for a large gain in apparent aggression, which is what the fitness (aggression × survival) rewards.

5. Discussion

Return to the motivating question — how do you get invited to the soft game without going broke once you are in it? The strategy answers both halves at once, and they turn out to be the same behaviour. The trait that wins the invite is visible recklessness; the trait that keeps the stack is invisible discipline; and the survivor achieves both by routing all of its recklessness into the channel opponents watch (postflop bets and barrels) while keeping its discipline in the channels they don't (which hands it opens, and whether it commits when raised). You are loud where you are seen and quiet where you are not. That is why the optimal play is, deliberately, to play wrong: the EV you surrender by spewing postflop is the price of admission to a game whose EV per hand dwarfs it, and the price is structured so it can never cost you the stack.

The result is intuitive in hindsight and sharper than the folk wisdom. "Play loose and aggressive to look like a fish" is only half right: the durable disguise is selective aggression — concentrate the spew into low-commitment actions (betting, barrelling) where you can always fold, and stay disciplined exactly where mistakes are expensive (loose preflop calls, calling down light). The image is built postflop with bets and bluffs; the bankroll is protected preflop and at the moment of confrontation.

This separates two things usually conflated under "aggression": initiative (betting, which our optimum maximises) and commitment (calling/stacking off, which it minimises via low stickiness and tighter ranges). Survival rewards maximal initiative and minimal commitment simultaneously — a combination that reads as "maniac" at the table but behaves like a disciplined nit when the chips threaten to go in.

6. Limitations

7. Conclusion

The most important decision in poker is which game you sit in, and the surest way to be shut out of the profitable ones is to play them perfectly. Optimal play, paradoxically, can be to play visibly wrong — to buy a seat among weaker players by looking like one of them. Under a rule that rewards aggressive, suboptimal-looking play but punishes bankruptcy — our proxy for that invitation pressure — the surviving strategy is not a maniac and not a nit but a precise hybrid: tight preflop, hyper-aggressive postflop, quick to fold under fire. It maximises how suboptimal it looks (an aggression-directed distance of +1.24 from GTO, ~2× the anchor's postflop deviation) while keeping its stack out of harm's way. The method — a GTO-anchored genome, an evolutionary search, and an offline heads-up solver for ground truth — recovers this frontier cleanly and quantifies exactly how far from equilibrium a player can drift and still reach the final table.

Appendix — Reproducibility

Configuration. 32 bots; 8-handed tables; 1/2 blinds; 200 bb stacks; bracket 32→16→8, play down to 4/table (1000-hand cap); bankruptcy = zero stack; +100 bb champion prize. Evolution: population 32, 30 generations, 12 replications, Gaussian mutation σ=0.12, seed 1234.

Survivable Suboptimality: Evolving the Loosest-Looking Poker Range That Never Goes Bankrupt