A number I need to take back
Back in April I wrote a post about rebuilding my crypto strategy. The headline result: a 12-month backtest where my v2 rules returned +35.2% while buy-and-hold lost money. I was proud of it. I put it on a public dashboard.
This week I sat back down — this time with Opus 4.8 — to do something I should have done before publishing that number: try to break it.
By the end of the session I'd confirmed the strategy is good. I'd also confirmed that the +35.2% on my dashboard was wrong, that I'd misread which version of the strategy was actually running live, and that the single most important rule in the whole system was one I'd barely written about. So let me correct the record — because the true story is better than the marketing version.
The first thing a strategy you believe in deserves is an honest attempt to destroy it. If it survives, you've earned the right to trust it. If it doesn't, you just saved yourself from betting on a number that was never real.
The rule that actually matters
The April rebuild had a lot of moving parts — rotation-on-sell, a contagion regime gate, deploy-to-target rebuys. But the rule doing the real work is simpler and more boring than any of those: only put cash back into the market when Bitcoin is trading above its 200-day moving average. When BTC is below that line, sells go to cash and the book *stays* in cash.
That's it. A single trend filter. I call it the macro gate, and on paper it's the difference between a strategy that compounds and one that bleeds.
Here's the comparison that reframed everything. Same v2 rules, the only difference being whether that one gate is switched on:
V2 without the macro gate
Rotation and rebuys fire in every market. Over the 12-month window it returned -2.3% with a -27% max drawdown. It kept redeploying cash into a downtrend — lots of activity, no edge. This is the version I *thought* was my +35% strategy. It wasn't.
V2 with the macro gate (live)
Identical rules, but cash only redeploys when BTC is above its 200-day average. +16.7% return, -13% max drawdown, Sharpe ~1.05 — best return, lowest drawdown, best risk-adjusted result of everything I tested. This is what's actually running.
The honest single-window numbers across the board: buy-and-hold -44.8%, my original v1 rules +15.4%, ungated v2 -2.3%, and the gated v2 that's live +16.7%. The dashboard has been updated to show those real figures. The old +35% was an artifact of an earlier data window and some hardcoded values that never got refreshed. That's on me, and it's fixed.
How Opus 4.8 made me prove it instead of believe it
A single backtest over a single year is one of the most seductive lies in quant work. You ran it once, it looked great, you shipped it. But a 12-month window is *one* path through history. Maybe the strategy is brilliant. Maybe it just happened to fit the exact sequence of ups and downs that occurred.
So instead of admiring the number, we tried to figure out whether it would hold up in markets that *didn't* happen. Two tests.
Test one — does the edge depend on a magic parameter? I'd set the gate at the 200-day average. Is 200 special, or did I tune it to win? We swept it from 150 days all the way to 250. The results barely moved. The edge isn't balanced on a knife — it's a property of the trend filter itself, not a number I curve-fit.
Test two — the one that actually convinced me. We built a Monte Carlo harness: chop the real year of price history into contiguous blocks, then reshuffle those blocks into hundreds of synthetic alternate histories — bull-heavy ones, bear-heavy ones, choppy ones — preserving how assets move together. Then run the gated strategy and the ungated strategy head-to-head on 400 of those reshuffled worlds, three different ways.
The gated version won in 77–85% of them. It roughly halved the median drawdown — about -18% versus -30%. And it protected the bad tail: in the worst 5% of outcomes, the gate's floor was meaningfully higher than going ungated. It wasn't winning on the one history that happened to occur. It was winning across the space of histories that *could* have occurred. That's the difference between luck and an actual effect.
One backtest tells you what happened. Four hundred reshuffled backtests start to tell you what's *true*. The gate isn't a story about last year — it survives in markets that never existed.
The most uncomfortable line in the whole session
While we were at it, Opus 4.8 caught something I'd gotten plainly wrong. I had been describing my live strategy using a label in my own backtest code that read "V2 (live)" — the -2.3% version. For a few minutes I genuinely believed the thing trading my money was the *worst* non-buy-and-hold result on the board.
It wasn't. The strategy I actually deployed back in June was already macro-gated. The label was stale; the deployed system was the good one. I'd confused my own counterfactual for my production code. We traced it, corrected the labels, and verified against the live account.
And the live account told the real story better than any backtest could. Right now Bitcoin is about 20% below its 200-day average. The gate is closed. So my strategy is sitting in 100% cash — and that is not a malfunction. That is the entire point. The alpha isn't a clever trade. It's the discipline to do nothing while everyone else is catching knives.
What I'm not going to pretend
Here's the caveat I'd want a reader to hold onto, because leaving it out would repeat exactly the mistake I just corrected.
All of this is validated on a single year of data — and a brutally bearish one. My data source caps free history at 365 days, and that year was mostly a downtrend, which means even my 400 reshuffled worlds are mostly bearish worlds. The gate's superpower is sitting out bear markets, and I've now tested that six ways from Sunday. What I *haven't* meaningfully stress-tested is how it behaves in a sustained bull — specifically the whipsaw risk where the price keeps poking above and below the 200-day line and the strategy re-enters late after a sharp V-shaped recovery. The honest expected cost there is some lag. I'll find out when Bitcoin reclaims that line, and I'll write up whatever actually happens — good or bad.
The actual lesson
The code didn't change much this week. What changed was the amount of evidence behind it, and the accuracy of what I was claiming about it.
In April the story was "I built a strategy that returns 35%." This week the story became "I built a strategy whose edge is a single trend filter, it returns about 17% with half the drawdown of holding, it survives hundreds of simulated markets, it's currently and correctly sitting in cash, and here's exactly where it might disappoint me." The second story is less flashy and far more useful — to me, and to anyone reading.
That's the part of working with these models I keep coming back to. The leverage isn't just that they help you build faster. It's that, used well, they make you harder to fool — including by yourself.
The live numbers update three times a day on the dashboard. If you've done robustness testing on a personal strategy and found a result that embarrassed you into a better version of it, I want to hear it — drop it in the comments, or reach out through the contact page.
Comments
Leave a comment
Rebuilding the Crypto Strategy: Opus 4.6 Shipped It. Opus 4.7 Fixed It.
A 9-sell AVAX trim cascade exposed what my weekend build was missing. Here's how a real backtest, a regime detector, and one honest critique session turned a shippable-but-flawed strategy into one that actually beats buy-and-hold.
Commander's Intent for AI Agents
The Army taught me to give intent, not step-by-step orders, and trust the people closest to the problem. It turns out that is also the secret to getting great work out of AI.
AI SEO Is the New Visibility Game. Here's How I Picked a Tool For It.
People are asking AI instead of Googling. That quietly rewrites the rules of being found. Here's how I evaluated Profound versus Athena for work — and why we went with Athena.