M
es
Arbitrage in prediction markets
In Progress

Arbitrage in prediction markets

Python Prediction-Markets

SurePred is an automated bot that detects and executes arbitrage opportunities between two prediction market platforms: Kalshi and Polymarket. The goal is to find situations where you can bet on both sides of the same event (yes and no) on different platforms, guaranteeing a profit regardless of the final outcome of the event.

What are Prediction Markets?

Prediction markets are platforms where people can buy and sell "contracts" on the outcome of future events. They function like financial markets, but instead of stocks, predictions are traded.

Example

Imagine a market about: "Will the Federal Reserve raise interest rates at the next meeting?"

  • If you believe yes, you buy a "Yes" contract for, say, $0.40
  • If you believe no, you buy a "No" contract for $0.60

When the event resolves:

  • If the Fed raises rates → the "Yes" contract is worth $1.00, the "No" is worth $0.00
  • If the Fed does not raise rates → the "Yes" contract is worth $0.00, the "No" is worth $1.00

What is Arbitrage?

Arbitrage is a strategy that takes advantage of price differences between markets to obtain a guaranteed profit, regardless of the outcome.

Arbitrage Example

Suppose we have the same event on two platforms:

Event: "Will Jeannette Jara win the 1st round of the 2025 Chilean presidential election?"

Platform YES Price NO Price
Polymarket $0.45 $0.57
Kalshi $0.52 $0.46

Arbitrage strategy:

  • Buy YES on Polymarket: $0.45
  • Buy NO on Kalshi: $0.46
  • Total cost: $0.45 + $0.46 = $0.91

Possible results:

  • If Jara wins → we win $1.00 from Polymarket (YES) → Profit = $1.00 - $0.91 = $0.09 (9.89% ROI)
  • If Jara does not win → we win $1.00 from Kalshi (NO) → Profit = $1.00 - $0.91 = $0.09 (9.89% ROI)

We earn the same regardless of the outcome.

Arbitrage formula

For arbitrage to exist, the sum of the complementary prices must be less than 1:

PyesPolymarket+PnoKalshi<1P_{yes}^{Polymarket} + P_{no}^{Kalshi} < 1

or

PnoPolymarket+PyesKalshi<1P_{no}^{Polymarket} + P_{yes}^{Kalshi} < 1

The ROI (Return on Investment) is calculated as:

ROI=1(P1+P2)P1+P2×100%ROI = \frac{1 - (P_1 + P_2)}{P_1 + P_2} \times 100\%

Where P1P_1 and P2P_2 are the prices of the complementary positions.

How much money to invest in each platform?

To execute the arbitrage, we need to buy the same number of contracts on both platforms. Each contract pays $1.00 if it wins.

Step 1: Calculate the maximum contracts possible

Given a total budget BB, the maximum number of contracts is limited by the most expensive platform:

Ncontracts=min(BPolymarketPyesPoly,BKalshiPnoKalshi)N_{contracts} = \left\lfloor \min\left(\frac{B_{Polymarket}}{P_{yes}^{Poly}}, \frac{B_{Kalshi}}{P_{no}^{Kalshi}}\right) \right\rfloor

Step 2: Calculate the money to invest in each platform

MoneyPolymarket=Ncontracts×PyesPolymarketMoney_{Polymarket} = N_{contracts} \times P_{yes}^{Polymarket} MoneyKalshi=Ncontracts×PnoKalshiMoney_{Kalshi} = N_{contracts} \times P_{no}^{Kalshi}

Example with a budget of $100:

Suppose we have $50 on each platform with the prices from the previous example:

  • YES Price Polymarket: $0.45
  • NO Price Kalshi: $0.46

Maximum contracts: N=min(500.45,500.46)=min(111.1,108.7)=108N = \left\lfloor \min\left(\frac{50}{0.45}, \frac{50}{0.46}\right) \right\rfloor = \left\lfloor \min(111.1, 108.7) \right\rfloor = 108

Investment per platform:

  • Polymarket: 108×0.45=$48.60108 \times 0.45 = \$48.60
  • Kalshi: 108×0.46=$49.68108 \times 0.46 = \$49.68
  • Total invested: $98.28

Guaranteed profit:

  • Payout upon resolution: 108×1.00=$108.00108 \times 1.00 = \$108.00
  • Net profit: $108.00 - $98.28 = $9.72 (9.89% ROI)

The problem: finding market pairs between platforms

One of the biggest challenges of the project is identifying when two markets on different platforms refer to the same event. Platforms do not use the same identifiers or the same wording.

Problem example

The same political event can appear as:

  • Polymarket: "Will Jeannette Jara win the 1st round of the 2025 Chilean presidential election?"
  • Kalshi: "Will Jeannette Jara win the 2025 Chile Presidential election first round?"

Or an economic event:

  • Polymarket: "Will the Fed raise interest rates in December 2025?"
  • Kalshi: "Federal Reserve December 2025 rate decision: Raise?"

The questions are similar but not identical, which complicates automatic matching.


Current solution: embeddings + cosine similarity

To solve the matching problem, the system uses TF-IDF (Term Frequency-Inverse Document Frequency) to convert text questions into numerical vectors, and then calculates the cosine similarity between them.

What is TF-IDF?

TF-IDF is a natural language processing technique that converts text into numerical vectors, assigning more weight to words that are:

  1. Frequent in the document (TF - Term Frequency)
  2. Infrequent in the set of documents (IDF - Inverse Document Frequency)
TF-IDF(t,d)=TF(t,d)×IDF(t)TF\text{-}IDF(t, d) = TF(t, d) \times IDF(t)

Where:

  • TF(t,d)TF(t, d) = frequency of term tt in document dd
  • IDF(t)=log(Ndf(t))IDF(t) = \log\left(\frac{N}{df(t)}\right)
  • NN = total number of documents
  • df(t)df(t) = number of documents containing term tt

What is cosine similarity?

Once we have the TF-IDF vectors, we measure how similar they are by calculating the cosine of the angle between them:

similarity(A,B)=cos(θ)=ABA×B\text{similarity}(\vec{A}, \vec{B}) = \cos(\theta) = \frac{\vec{A} \cdot \vec{B}}{||\vec{A}|| \times ||\vec{B}||}

Where:

  • AB\vec{A} \cdot \vec{B} = dot product of the vectors
  • A||\vec{A}|| = magnitude (norm) of vector A
  • The result is in the range [0,1][0, 1]

Interpretation:

  • 1.0 = The texts are identical
  • 0.7-1.0 = High similarity, probably the same event
  • 0.4-0.7 = Medium similarity, requires review
  • 0.0-0.4 = Low similarity, probably different events

Example

The following interactive 3D plot illustrates how cosine similarity works. In reality, the space has thousands of dimensions, but only three are shown here to make the example visible.

Vectors that are close together (small angle, high cosine) represent similar markets, while distant vectors represent different markets:

Drag and select pairs to rotate the graph.

Current process

Currently, the process of matching markets is done semi-automatically:

  1. Market download: All available markets are obtained from both platforms
  2. Automatic matching: The TF-IDF + cosine similarity algorithm finds candidates
  3. Manual review: Matches are reviewed and confirmed which are correct
  4. JSON saving: Validated markets are stored in monitoring_markets.json

Real-time price monitoring

Once we have the matched markets, the system monitors their prices in real-time:

Streaming architecture

graph LR
    P[Polymarket<br>WebSocket] -->|Market Data| M[MarketPairMonitor]
    K[Kalshi<br>WebSocket] -->|Market Data| M
    
    subgraph Stream[Stream Processing]
        M -->|Updates| DF[DataFrame<br>with prices]
        DF -->|Data Stream| AC[ArbitrageCalculator<br>Detects arbitrage]
    end
    
    AC -->|Signal| AT[ArbitrageTrader<br>Executes orders]
    
    style P fill:#2d2d2d,stroke:#fff,stroke-width:2px
    style K fill:#2d2d2d,stroke:#fff,stroke-width:2px
    style M fill:#1a1a1a,stroke:#3b82f6,stroke-width:2px
    style DF fill:#1a1a1a,stroke:#3b82f6,stroke-width:2px
    style AC fill:#1a1a1a,stroke:#10b981,stroke-width:2px
    style AT fill:#1a1a1a,stroke:#f59e0b,stroke-width:2px

Bot Configuration: config.json

The bot's behavior is controlled by the config.json file:

Parameters:

Parameter Type Description
mode string "betting": Automatically executes orders
"listening": Only monitors, saves opportunities without executing
"off": Bot inactive
max_absolute_bet number Maximum money ($) to invest on each side of a market
min_absolute_bet number Minimum money required to execute an operation
min_ROI_per_bet number Minimum ROI (%) required to consider an opportunity
max_end_days integer Only consider markets that end in the next N days
debug_stop_first_bet boolean If true, the bot stops after the first operation (testing)