Simpson’s Paradox in Trading: A Deep Dive

The Math Engine: Weighted Averages

The Numbers Don’t Lie — Aggregates Do

The Monthly Review Meeting

The portfolio manager scans the dashboard intensely. He was being pressed for time as bonus season deadlines loomed, and the desk faced $2B Assets Under Management scrutiny from investors. Overall win rates by trader lit the screen up brightly. These were numbers for his two top traders, and numbers that truly mattered. Daniel averages 74.5% across 110 trades, and John 71.8% across 110. Daniel certainly deserves the bonus; the math was plain—top-quartile performance, client-facing star—while John gets a PIP—or worse. ‘John’s slippage doesn’t just hurt win rates—it guts that hard-won reward per risk unit, killing our Sharpe,’ snaps the head of execution. ‘74% vs. 72%? That’s 300 basis points of edge in volatile flow, worth millions in P&L if executed clean.’ This article illustrates Simpson’s paradox in trading through a performance review that rewards the wrong trader once results are aggregated.

His colleagues exchanged knowing glances. Fire John, keep Daniel. Case closed. Consensus hardened: John’s the weak link ahead of Q1 vol spike. But the portfolio manager briefly pauses. Something tells him to break it down by market regime; he pulls up the segmented data, and the story flips.

The Hidden Split

Markets aren’t uniform; they split into regimes: Calm and Volatile. We assume these regimes differ structurally in difficulty, with volatile trades having lower baseline win rates, and that assignment into regimes is not random. Daniel cherry-picks calm trades over volatile ones. He’s like the safe pair of hands, the guy who delivers consistent wins without drama. John favors volatile ones, and trades them far more often. John’s like the fireman, the one clients call at 3 AM when markets break since he thrives in the storm. John prefers the hard cases since he’s proven he can handle them while Daniel sticks to easy waters.

Now, their performance within regimes:

Regime	Daniel (Apparent top performer)	John (Crisis specialist)
Calm (Easy)	81/100 = 81%	9/10 = 90%
Volatile (Hard)	1/10 = 10%	70/100 = 70%
Overall	82/110 = 74.5%	79/110 = 71.8%

John outperforms Daniel within both regimes: 90% to 81% in Calm, and 70% to 10% in Volatile. Yet overall ? Daniel wins. How does the fireman lose to the safe player ?

The Paradox Revealed

This is Simpson’s paradox: John, the hard-cases specialist, is better everywhere. But aggregate all trades into one bucket, and Daniel gets the upper hand. Why?

The aggregate result reflects assignment effects rather than pure skill. 91% of Daniel’s volume: Calm regime (easy Band-Aids), surfing predictable waters. John’s? 91% Volatile (heart surgeries), storm where every trade counts double.

The overall win rate is a weighted average of conditional performance, where the weights reflect regime exposure rather than skill. John’s 70% Volatile dominance gets drowned by sheer volume of hard cases. Daniel’s 81% Calm looks shiny because that’s all he does and it’s in easy waters.

Imagine the boardroom: ‘John’s dragging us down.’ Fire him? Your crisis specialist bolts although clients panic when markets crack. Promote Daniel, and you reward regime-shopping. The desk loses its edge in chaos, all because one number lied. Same trap as Dr. Hibbert vs. Dr. Nick from the original Simpson’s Paradox from textbooks: the expert gets more hard cases, tanks the aggregate.

The Math Engine: Weighted Averages

Daniel and John’s overall win rate is not a simple 50/50 average of their exposure to the calm and volatile regimes. It’s a weighted average of their exposure to each regime, where the weights are calculated based on how often this trader is exposed to each regime.

Daniel spends almost all his time within the calm regime: about 91% of his trades are band‑aid cases. His 81% success rate there dominates his record. The few volatile attempts he makes, with only 10% success, barely move the needle, so his overall number lands at 74.5%.

John spends the bulk of his time and effort in the opposite world. Only 9% of his trades are in the calm regime, where he wins 90% of the time. The other 91% are heart‑surgery trades in the volatile, where his success rate is 70%. That mix drags his overall number down to 71.8%.

A good way to imagine it is as two cocktails. Daniel’s glass is almost all sweet mixer (easy trades) with a dash of strong spirit (hard trades). John’s glass is mostly spirit. If you only taste the final drink, Daniel’s seems smoother, but that says more about the recipe than the bartender’s skill.

John’s volatile exposure anchors his total to hard regime reality. Daniel’s calm skew inflates his. Aggregation is not neutral: it mechanically reflects differences in regime exposure.

Not Promoting Cherry-Pickers

Always segment first: Report hit rates/PnL by regime (Calm/Volatile), ticket size (small/large), product (liquid/illiquid). Aggregates mislead since they hide who actually creates value where it matters.

Condition before concluding: John’s 70% Volatile edge equals alpha in crisis. Daniel’s 81% Calm stats equal safety—good days only. Don’t judge a firefighter by sunny-weather performance.

Reward specialists: John’s 70% in chaos is much greater thanDaniel’s 10%. Fire him, and your desk loses its 3AM crisis trader when clients need block fills most. Promote Daniel, and you reward the guy who avoids hard work.

Relationship to gambler’s ruin: Just as small modeling errors compound over time in gambler’s ruin, unconditioned metrics can compound into systematically bad personnel decisions.

The Numbers Don’t Lie — Aggregates Do

A = Trade wins

B = Daniel traded it

B^c= John traded it

C = Volatile regime

C^c= Calm regime

Simpson’s Paradox:

\begin{align*} & P(A \mid B, C) < P(A \mid B^c, C) \rightarrow \text{Daniel’s Volatile (10\%)} < \text{John’s Volatile (70\%)} \\ & P(A \mid B, C^c) < P(A \mid B^c, C^c) \rightarrow \text{Daniel’s Calm (81\%)} < \text{John’s Calm (90\%)} \\ & \text{but} \\ & P(A \mid B) > P(A \mid B^c) \rightarrow \text{Daniel’s Overall (74.5\%)} > \text{John’s Overall (71.8\%)} \end{align*}

Law of total probability:

\begin{align*} P(A \mid B) &= P(A \mid B, C)P(C \mid B) + P(A \mid B, C^c)P(C^c \mid B) \\ &= (0.10)(0.09) + (0.81)(0.91) \approx 0.745 \\[10pt] P(A \mid B^c) &= P(A \mid B^c, C)P(C \mid B^c) + P(A \mid B^c, C^c)P(C^c \mid B^c) \\ &= (0.70)(0.91) + (0.90)(0.09) \approx 0.718 \end{align*}

Weights explain the flip: P(C∣B) = 0.09 (Daniel rarely does Volatile), P(C∣B^c)=0.91 (John mostly does Volatile). Daniel weights heavily toward easy Calm; John toward hard Volatile.

Simpson’s paradox isn’t just textbook math—it’s why trading desks may fire their best traders and promote tourists. Next time a dashboard shows clear winners, it’s best to ask: conditioned on what ? The market doesn’t reward aggregates, but those who see through them. Simpson’s paradox also shows that unconditioned metrics compound to catastrophe.

The Monthly Review Meeting

The Hidden Split

The Paradox Revealed

The Math Engine: Weighted Averages

Not Promoting Cherry-Pickers

The Numbers Don’t Lie — Aggregates Do

You Might Also Like