AI Now Matches Prediction Markets in Forecasting Real Events, Study Finds

Briefly

Prophet Area assessments AI fashions by having them predict real-world, unresolved occasions, with GPT-5 at present main the rankings.
AI fashions present distinct prediction “personalities” and sometimes diverge from market consensus, generally producing excessive returns.
Early outcomes recommend AI can forecast as precisely as prediction markets, probably remodeling institutional decision-making.

A brand new synthetic intelligence benchmark launched in August reveals that AI fashions can forecast real-world occasions as precisely as prediction markets—and generally higher, in accordance with researchers on the College of Chicago’s SIGMA Lab.

Prophet Area evaluates AI programs by having them predict the outcomes of reside, unresolved occasions drawn from platforms like Kalshi and Polymarket—starting from election outcomes to sports activities matches and financial indicators. Not like conventional benchmarks that take a look at fashions on historic information with identified solutions, Prophet Area assessments AI towards future predictions.

“By anchoring evaluations in unresolved, real-world occasions, Prophet Area ensures a degree enjoying area. There isn’t a pre-training benefit, no secret fine-tuning trick, no leakage of take a look at samples,” the Prophet Area crew mentioned within the benchmark’s official weblog publish.

The benchmark says it’s making an attempt to deal with a elementary query about synthetic intelligence: “Can AI programs reliably predict the longer term by connecting the dots throughout present real-world data?”

Early outcomes recommend they will. GPT-5 at present leads the leaderboard with a Brier rating of 82.21%. In the meantime, OpenAI’s o3-mini mannequin has emerged because the revenue champion, producing the very best common returns when its predictions are translated into simulated bets (normally an underdog with sufficient probabilities to win can present much more return, given the correct circumstances).

DeepSeek R1 seems to be the contrarian AI within the group, incessantly making predictions that diverge sharply from each different fashions and market consensus, so most likely not the most effective mannequin to belief if you wish to make a fast buck on Myriad Markets.

The platform reveals distinct “personalities” amongst AI fashions when going through equivalent data. In a single instance, when predicting whether or not AI regulation would turn into federal regulation earlier than 2026, the market assigned only a 25% chance. However the fashions diverged wildly: Qwen 3 predicted 75%, GPT-4.1 estimated 60%, whereas Llama 4 Maverick stayed conservative at 35%.

In one other case, o3-mini earned a simulated $9 return on a $1 guess by appropriately predicting Toronto FC would beat San Diego FC in a Main League Soccer match. The mannequin gave Toronto a 30% likelihood of successful, whereas the market priced it at simply 11%. Toronto received.

“(Prophet Area) assessments fashions’ forecasting functionality, a excessive type of intelligence that calls for a broad vary of capabilities, together with understanding present data and information sources, reasoning below uncertainty, and making time-sensitive predictions about unfolding occasions,” the researchers wrote.

The Prophet Area additionally permits human-AI collaboration. Customers can provide extra information and context to see how predictions shift, whereas AI fashions present detailed rationales for his or her forecasts.

As prediction markets themselves combine AI—Kalshi just lately partnered with Elon Musk’s Grok, whereas Polymarket generates AI-powered market summaries—Prophet Area presents the primary systematic comparability of machine forecasting towards collective human judgment.

And, in the event that they get actually good at it, then machines may be purely factual, with no sentiments or feelings enjoying a job within the choices. They might probably match or exceed the knowledge of crowds, altering the way in which establishments strategy threat evaluation, funding choices, and strategic planning.

The Prophet Area platform continues updating day by day as occasions resolve, offering an evolving image of whether or not synthetic intelligence can actually predict the longer term by connecting in the present day’s dots.