Skip to main content

Historical Replay

Starting from April 18, 2026, we save everything our Data Stream emits, every hour. Perfect for backtesting strategies, replaying market conditions, research, or anything else you can think of.

How it works

Every hour we flush all incoming events to a compressed archive named after the UTC hour it covers:

https://replay.pumpapi.io/YEAR/MONTH/DAY/HOUR.jsonl.zst

For example, the events between 2026-04-18 00:00:00 UTC and 2026-04-18 01:00:00 UTC live at:

https://replay.pumpapi.io/2026/04/18/01.jsonl.zst

We use zstandard (zst) compression so downloads are as fast as possible:

  • ~400 MB per hour compressed
  • ~2 GB in memory after decompression
  • ~1 second to decompress one hour

Each line in the decompressed file is a single JSON event — the exact same format you get from the live Data Stream.

Browsing available archives

You can browse what's saved directly in your browser:

  • https://replay.pumpapi.io/2026/ — list all months in 2026
  • https://replay.pumpapi.io/2026/04/ — list all days in April 2026
  • https://replay.pumpapi.io/2026/04/18/ — list all hourly files for April 18

Example: Replay the last N hours

The snippets below take a HOURS variable and stream every event from that window. For example, if it's currently 12:45 UTC and HOURS = 2, you'll replay every event between 10:00 and 12:00 UTC.

import asyncio
from datetime import datetime, timezone, timedelta
import orjson as json # or use the standard json module (orjson is faster)
import aiohttp
import zstandard as zstd

HOURS = 2 # last 2 hours
ALLOW_GAPS = False

async def fetch(session, hour_dt):
url = f"https://replay.pumpapi.io/{hour_dt:%Y/%m/%d/%H}.jsonl.zst"
print(f"[fetch] {url}")
async with session.get(url) as r:
if r.status == 404:
if not ALLOW_GAPS:
raise RuntimeError(f"missing: {url}")
return None
r.raise_for_status()
return await r.read()

async def main():
now = datetime.now(timezone.utc).replace(minute=0, second=0, microsecond=0)
hours = [now - timedelta(hours=i) for i in range(HOURS, 0, -1)]
dctx = zstd.ZstdDecompressor()
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout()) as session:
for hour_dt in hours:
compressed = await fetch(session, hour_dt)
if compressed is None:
continue
print('downloaded')
decompressed = dctx.decompress(compressed)
print('decompressed')
for line in decompressed.splitlines():
event = json.loads(line.decode())
print(event)

if __name__ == "__main__":
asyncio.run(main())

Backtesting

When you backtest on replay data, remember that a huge portion of transactions on AMMs like pump.fun are sent through Jito Bundles. A bundle must be treated as one big atomic transaction — you cannot insert your own transaction in the middle of one.

Bundles are not explicitly labeled in the event stream, but you can detect them heuristically:

  • Every event has a millisecond-precision timestamp. Transactions inside the same bundle are executed simultaneously, so their timestamps are almost identical — typically within 1–3 ms of each other.
  • Bundled transactions usually interact with the same token.

A solid rule of thumb: treat any group of transactions that hit the same token within ~3 ms as a single bundled event. For certainty, you can cross-check any suspicious cluster on https://explorer.jito.wtf/.

Also, add a slightly larger-than-usual latency buffer to your simulation. Real execution will always be a bit slower than replay, and being conservative here prevents your backtest from looking more profitable than reality.

localTimestamp

In the replay archives you'll encounter an extra field that does not exist in the live Data Stream: localTimestamp. It's the time at which our replay server in Frankfurt am Main (Germany) received the transaction — which may be slightly later than what you'd observe on your end in real time. You generally don't need it for backtesting purposes, but it can be useful for checking whether your own server had good latency at a given moment.


Need help? Join our Telegram group.