SurePred is an automated bot that detects and executes arbitrage opportunities between two prediction market platforms: Kalshi and Polymarket. The goal is to find situations where you can bet on both sides of the same event (yes and no) on different platforms, guaranteeing a profit regardless of the final outcome of the event.
What are Prediction Markets?
Prediction markets are platforms where people can buy and sell "contracts" on the outcome of future events. They function like financial markets, but instead of stocks, predictions are traded.
Example
Imagine a market about: "Will the Federal Reserve raise interest rates at the next meeting?"
- If you believe yes, you buy a "Yes" contract for, say, $0.40
- If you believe no, you buy a "No" contract for $0.60
When the event resolves:
- If the Fed raises rates → the "Yes" contract is worth $1.00, the "No" is worth $0.00
- If the Fed does not raise rates → the "Yes" contract is worth $0.00, the "No" is worth $1.00
What is Arbitrage?
Arbitrage is a strategy that takes advantage of price differences between markets to obtain a guaranteed profit, regardless of the outcome.
Arbitrage Example
Suppose we have the same event on two platforms:
Event: "Will Jeannette Jara win the 1st round of the 2025 Chilean presidential election?"
| Platform | YES Price | NO Price |
|---|---|---|
| Polymarket | $0.45 | $0.57 |
| Kalshi | $0.52 | $0.46 |
Arbitrage strategy:
- Buy YES on Polymarket: $0.45
- Buy NO on Kalshi: $0.46
- Total cost: $0.45 + $0.46 = $0.91
Possible results:
- If Jara wins → we win $1.00 from Polymarket (YES) → Profit = $1.00 - $0.91 = $0.09 (9.89% ROI)
- If Jara does not win → we win $1.00 from Kalshi (NO) → Profit = $1.00 - $0.91 = $0.09 (9.89% ROI)
We earn the same regardless of the outcome.
Arbitrage formula
For arbitrage to exist, the sum of the complementary prices must be less than 1:
or
The ROI (Return on Investment) is calculated as:
Where and are the prices of the complementary positions.
How much money to invest in each platform?
To execute the arbitrage, we need to buy the same number of contracts on both platforms. Each contract pays $1.00 if it wins.
Step 1: Calculate the maximum contracts possible
Given a total budget , the maximum number of contracts is limited by the most expensive platform:
Step 2: Calculate the money to invest in each platform
Example with a budget of $100:
Suppose we have $50 on each platform with the prices from the previous example:
- YES Price Polymarket: $0.45
- NO Price Kalshi: $0.46
Maximum contracts:
Investment per platform:
- Polymarket:
- Kalshi:
- Total invested: $98.28
Guaranteed profit:
- Payout upon resolution:
- Net profit: $108.00 - $98.28 = $9.72 (9.89% ROI)
The problem: finding market pairs between platforms
One of the biggest challenges of the project is identifying when two markets on different platforms refer to the same event. Platforms do not use the same identifiers or the same wording.
Problem example
The same political event can appear as:
- Polymarket: "Will Jeannette Jara win the 1st round of the 2025 Chilean presidential election?"
- Kalshi: "Will Jeannette Jara win the 2025 Chile Presidential election first round?"
Or an economic event:
- Polymarket: "Will the Fed raise interest rates in December 2025?"
- Kalshi: "Federal Reserve December 2025 rate decision: Raise?"
The questions are similar but not identical, which complicates automatic matching.
Current solution: embeddings + cosine similarity
To solve the matching problem, the system uses TF-IDF (Term Frequency-Inverse Document Frequency) to convert text questions into numerical vectors, and then calculates the cosine similarity between them.
What is TF-IDF?
TF-IDF is a natural language processing technique that converts text into numerical vectors, assigning more weight to words that are:
- Frequent in the document (TF - Term Frequency)
- Infrequent in the set of documents (IDF - Inverse Document Frequency)
Where:
- = frequency of term in document
- = total number of documents
- = number of documents containing term
What is cosine similarity?
Once we have the TF-IDF vectors, we measure how similar they are by calculating the cosine of the angle between them:
Where:
- = dot product of the vectors
- = magnitude (norm) of vector A
- The result is in the range
Interpretation:
- 1.0 = The texts are identical
- 0.7-1.0 = High similarity, probably the same event
- 0.4-0.7 = Medium similarity, requires review
- 0.0-0.4 = Low similarity, probably different events
Example
The following interactive 3D plot illustrates how cosine similarity works. In reality, the space has thousands of dimensions, but only three are shown here to make the example visible.
Vectors that are close together (small angle, high cosine) represent similar markets, while distant vectors represent different markets:
Drag and select pairs to rotate the graph.
Current process
Currently, the process of matching markets is done semi-automatically:
- Market download: All available markets are obtained from both platforms
- Automatic matching: The TF-IDF + cosine similarity algorithm finds candidates
- Manual review: Matches are reviewed and confirmed which are correct
- JSON saving: Validated markets are stored in
monitoring_markets.json
Real-time price monitoring
Once we have the matched markets, the system monitors their prices in real-time:
Streaming architecture
graph LR
P[Polymarket<br>WebSocket] -->|Market Data| M[MarketPairMonitor]
K[Kalshi<br>WebSocket] -->|Market Data| M
subgraph Stream[Stream Processing]
M -->|Updates| DF[DataFrame<br>with prices]
DF -->|Data Stream| AC[ArbitrageCalculator<br>Detects arbitrage]
end
AC -->|Signal| AT[ArbitrageTrader<br>Executes orders]
style P fill:#2d2d2d,stroke:#fff,stroke-width:2px
style K fill:#2d2d2d,stroke:#fff,stroke-width:2px
style M fill:#1a1a1a,stroke:#3b82f6,stroke-width:2px
style DF fill:#1a1a1a,stroke:#3b82f6,stroke-width:2px
style AC fill:#1a1a1a,stroke:#10b981,stroke-width:2px
style AT fill:#1a1a1a,stroke:#f59e0b,stroke-width:2px
Bot Configuration: config.json
The bot's behavior is controlled by the config.json file:
Parameters:
| Parameter | Type | Description |
|---|---|---|
mode |
string | "betting": Automatically executes orders "listening": Only monitors, saves opportunities without executing "off": Bot inactive |
max_absolute_bet |
number | Maximum money ($) to invest on each side of a market |
min_absolute_bet |
number | Minimum money required to execute an operation |
min_ROI_per_bet |
number | Minimum ROI (%) required to consider an opportunity |
max_end_days |
integer | Only consider markets that end in the next N days |
debug_stop_first_bet |
boolean | If true, the bot stops after the first operation (testing) |