SharpeBench Tests Whether AI Trading Agents Have Real Edge

https://hackernoon.imgix.net/images/n7lSuZhhd2MJF0mYOmmUS7PDgd52-st93bpz.png

Give a thousand monkeys a quarter of market data and one of them will look like Renaissance. Rank traders by return over a short window and you are mostly ranking the variance of noise, not skill.

Today we are open-sourcing SharpeBench (https://github.com/general-liquidity/sharpebench), a luck-robust benchmark for AI trading agents. It is a single deterministic binary that takes any agent, in any language, and scores it not on how much it made, but on whether its edge is real. The crates are on crates.io and the methodology is below.

We built it because the field is being measured on sand. A 2026 audit of nineteen LLM-trading studies found that zero reached full reproducibility, only two used time-consistent train and test splits, and exactly one modelled transaction costs (https://arxiv.org/abs/2605.19337). When the scoreboard rewards the luckiest run on the friendliest window, that is what the research optimises for. For...

Copyright of this story solely belongs to hackernoon.com. To see the full text click HERE