Rebuilding the Crypto Strategy: Opus 4.6 Shipped It. Opus 4.7 Fixed It.

Update — June 2026: The +35.2% backtest result in this post didn't survive scrutiny. When I sat back down to stress-test the strategy with Opus 4.8, the honest number for the gated version came out to +16.7% — still a real edge, with roughly half the drawdown of holding — and the ungated variant benchmarked here actually lost money. The full correction, and the Monte Carlo evidence behind it, is in the follow-up post. I've left this post as written; the correction is part of the story.

The bug that wasn't a bug

Last weekend I spent a day with Opus 4.6 building an AI-driven crypto paper-trading strategy. It shipped: seven sell rules, three rebuy rules, a live dashboard, automated emails, running three times a day on a $20K fund. I was genuinely proud.

Tonight I sat down with Opus 4.7, scrolled the trade log, and saw the problem in one line: one rule had fired 9 times in 3 days, trimming my AVAX position by 92% — right as AVAX was rallying on the launch of the first Avalanche ETF.

The strategy wasn't broken. It was doing exactly what I told it to do. That was the problem.

The gap between "shipped working code" and "a strategy I'd trust with real money" turned out to be an afternoon of honest analysis — something the weekend build never got.

What the weekend build was missing

The v1 system was a fine prototype. Every piece worked. What it didn't have:

A backtest. Every threshold — the stop levels, the sentiment triggers — was picked because it sounded reasonable. That's it.
A cooldown. The portfolio-drawdown rule could fire on every single run once triggered, shredding positions during a normal dip.
A way back in. The sell rules were aggressive; the rebuy rules waited for a recovery signal that might never come. Cash piled up and stayed there.
Verified inputs. One rule monitored two "Chainlink treasury" wallets. When I finally checked them, one held about $257 of LINK and the other held nothing. The rule could never fire. Nobody noticed.

None of these are bugs in the usual sense. The code ran. Trades executed. You could review every function and sign off.

The question that changed everything

I asked Opus 4.7 one thing: can we validate these rules against historical data? Within the hour we had a backtest pulling a year of daily prices for all six assets and simulating three strategies head-to-head: buy-and-hold, the v1 rules as written, and a proposed v2.

The first run was humbling. Buy-and-hold lost money with a stomach-turning 60% peak-to-trough drop. V1 looked great — up nearly 28% — until you read its trade log: it stopped out of everything by mid-summer and sat in 100% cash for the remaining ten months. It "won" by liquidating once and never coming back. That's not a strategy. That's an exit.

Opus 4.6 (weekend build)

Shipped a working end-to-end system in one session. Picked reasonable defaults. Did not backtest, did not question whether the parameters produced good outcomes, and did not verify external inputs like treasury addresses.

Opus 4.7 (tonight)

Proposed the backtest before writing v2. Caught v1's sitting-in-cash problem on the first run. Caught a bug in its own v2 implementation on the second. Iterated four times before locking parameters, and dropped the treasury rule entirely when verification showed the addresses were wrong.

What only a backtest could surface

The re-firing drawdown rule. As written — "if the portfolio is down 15% from its peak, trim everything by 25%" — the rule fired 187 times in a year. Death by a thousand cuts. The fix: fire once per drawdown event and re-arm only at a new all-time high. Fires dropped from 187 to 10.

The dump-to-cash trap. When a stop fired, v1 sold to cash and waited. V2 rotates the proceeds into the healthiest remaining assets instead — scored by recent momentum and how far each one sits under its target weight.

A faster risk-off signal. Waiting for the classic "Bitcoin below its 200-day average" signal means you've already taken the pain. V2 adds a tripwire: three or more stop-losses firing within two weeks flips the whole book to risk-off for 30 days. In the backtest, this one heuristic cut the worst drawdown by more than half.

A way back in. A simple daily rule — if there's idle cash and an asset is meaningfully under its target weight, buy $500 of it — solved the frozen-in-cash problem.

The numbers

12-month backtest

Buy-and-hold: -2.3% return with a -60.7% max drawdown. V1 rules: +27.8%, but finished the year in 100% cash. V2 rules: +35.2% with a -13.4% max drawdown — and it stayed invested through the recoveries.

V2 won on every metric that matters, and it did it by trading more thoughtfully, not by avoiding the market. (And per the note at the top: these first-pass numbers were later corrected by a much harsher stress test. The methodology lesson held up. The +35% didn't.)

What I actually learned

The code Opus 4.6 wrote was fine. I'd approve it in review. But it never questioned the spec — it didn't ask whether the parameters were right, didn't flag that the treasury addresses looked wrong, didn't propose a backtest. It implemented what I described.

Opus 4.7 did those things unprompted. When its own backtest produced 625 trades, it said "that's too noisy, let me find the bug" instead of shipping it. One model delivers a plausible implementation of what you asked for. The other delivers a validated version of what you actually wanted.

The weekend build worked. Tonight's build earned the right to manage the money. Same person giving the prompts. Different model proposing the work.

What's next

The new strategy is live on the full ~$101K paper account as of tonight. The v1 history is preserved in the same database for side-by-side comparison, and every future sell gets re-graded a week later against a simple question: would holding have been better? Over time, the system learns which rules earn their keep.

The dashboard updates three times a day if you want to watch.

Comments

Loading comments…

Opus 4.8 Stress-Tested My Crypto Strategy. Then It Made Me Admit I Was Wrong.

Two months ago I published a backtest claiming my crypto strategy returned +35%. This week I sat down with Opus 4.8 to pressure-test that claim — and the honest answer was more humbling, more robust, and a better story than the one I'd told.

AI & EngineeringApril 12, 2026 · 4 min read

I Built an AI-Powered Crypto Trading System in One Afternoon

I spent a Saturday afternoon connecting four AI data services into a single system that analyzes social sentiment, reads blockchain data, runs technical analysis, and executes real crypto trades on a paper account. No code. No API wrappers. Just Claude and a handful of MCP servers.

← Back to The Lab