Twenty years of academic research underlies this work. This page covers the academic history, the mathematics, and how we adapted classical proof-of-storage schemes to IPFS’s content-addressed block model.
Looking for the interactive intro? Start with the warehouse game →
A dishonest storage provider who received your file can compute its SHA-256 hash once, store just those 32 bytes, and then delete the actual data. Every time you ask “do you have my file?” they answer correctly — because they computed the hash before deleting. They need O(1) storage to maintain the illusion of O(n) storage.
Any verification scheme based on a fixed, predictable question fails this way. The solution is to make challenges unpredictable — chosen after the data is committed — so the prover must hold the actual blocks to answer them. No amount of precomputation helps if the question changes every time.
This is why proof-of-storage schemes are interactive(or use pseudorandom challenges tied to a fresh seed): the prover can’t know in advance which blocks will be asked about, so they must keep all of them.
Proves the storage provider holds your data at the moment of the challenge. O(1) or O(log n) proof size. Unlimited challenges. Does notenable file recovery from audit logs — it’s a detection system, not a backup.
A strictly stronger guarantee: by accumulating enough challenge-response pairs, a third party can reconstruct the full file — even if the provider is misbehaving. Detection plus recoverability.
Twenty years of cryptography research from multiple independent groups underlies this work. These papers tackle different problems — some are theoretical foundations, others are practical constructions — and Pinion’s implementation draws directly from them.
The papers above were designed with traditional file storage in mind — fixed-size blocks, sequential integer indices. The question for Pinion is: how do you apply these proofs to IPFS, where blocks aren’t numbered integers but 256-bit content hashes arranged in a Merkle DAG?
The papers above describe files as flat arrays of fixed-size blocks. Block 1, block 2, block 3 — sequential integer indices. The protocols are designed around this model: you can challenge block 47, compute its tag, and the index is part of the tag computation. Simple, but rigid.
IPFS stores content differently. Data is organized as a Merkle DAG — a directed acyclic graph of content-addressed blocks, each identified by its CID (a cryptographic hash of its contents). A file is a root CID; walk the DAG and you discover every leaf block. Block identifiers aren’t sequential integers — they’re 256-bit hashes. And two different files that share content share the same block, identified by the same CID. Deduplication is structural.
To use proof-of-storage on IPFS data, we re-imagine the DAG’s blocks as entries in a very large sparse array indexed by CID bytes. Not every element is populated — the CID space is astronomically large. But for any given root, walking the DAG gives us the complete set of populated entries.
Not every proof-of-storage protocol handles sparse, arbitrary block IDs. The schemes that require sequential integer indices for their tag computation can’t be used directly. Pinion uses only protocols where the block identifier can be arbitrary bytes — fed into a PRF or hash — so that CID bytes work as naturally as sequential integers.
Pinion wraps all five schemes behind a single Go interface called the line protocol:
Tagger → TagBlocks(store) → []Tag Challenger → Challenge(blockIDs) → (challenge, validator) Prover → Prove(challenge, blockStore) → proof Validator → Verify(challenge, proof) → bool
Switching protocols requires changing only the string you pass to POST /challenge-key. Everything else — how you generate challenges, submit them, and verify responses — is identical. The interface is protocol-agnostic by design.
Each audit round starts with a fresh random 32-byte seed S. Everything about the challenge — which blocks to sample, and with what coefficients — is derived deterministically from that seed using a pseudorandom function (PRF):
The same seed produces the same challenge every time — so both the client (who generates the challenge) and the server (who constructs the proof) can independently compute which blocks are involved. No shared state. No coordination beyond passing the seed. Fresh seed each round means fresh, unpredictable challenge each round.
Because block IDs are arbitrary bytes, the PRF ranking works identically whether you’re challenging 3 blocks or 3 million. The IPFS sparse-array representation slots in naturally: CID bytes become the items being ranked, and the top-c selection is your challenge set.
Here’s the key theorem. If a fraction p of your blocks are sampled each round, and each round is independent, then after k rounds your confidence that no corruption has gone undetected is:
C(k) = 1 − (1 − p)kprobability of detecting corruption in at least one of k rounds, given p = fraction of blocks sampled per round
Each round is independent — like asking about a different random page each time. The probability of missingcorruption multiplies: if you have a 10% chance of catching it each round, after 10 rounds you’ve had 10 independent shots at it, and the chance of missing all of them is only 0.910≈ 35%. After 44 rounds at 10% sampling, you’re above 99% confidence.
The exponential convergence means you don’t need to download the file to trust it. You just need to keep asking — and the math does the rest.
Pinion implements all five schemes. The right choice depends on what properties you need:
| Protocol | Type | Key Type | Proof Size | Max Challenges | Verified Updates | Public Verification |
|---|---|---|---|---|---|---|
| Ateniese | PDP | RSA private | O(1) | Unlimited | No | No |
| Erway | PDP + Updates | RSA public | O(log n) | Unlimited | Yes | Yes |
| SW-Private | POR | Symmetric PRF | O(S) field elements | Unlimited | No | No |
| SW-Public | POR | EC keypair (BN256) | O(S) group elements | Unlimited | No | Yes (pairings) |
| BJO | POR | Symmetric PRF | O(1) per round | Bounded (Q sentinels) | No | No |
High-probability detection of corruption or loss, with O(1) or O(log n) proof sizes. Does not enable file recovery from challenge logs. Best for continuous monitoring — you get strong statistical confidence with minimal overhead.
A stricter guarantee: not only can you detect corruption, but a third party can recover the full file by accumulating enough challenge-response pairs. If Pinion ever misbehaves, the audit log becomes a recovery mechanism.
Here’s the part that distinguishes a proof system from a promise: the POST /prover/prove endpoint requires no authentication. Anyone who obtains a client_setup and a list of block_ids can issue a challenge and verify the response locally — no Pinion involvement in the verification step.
Any Pinion account holder can share their key_id with you. Call GET /prover/api/v1/setup?key_id=<id> and you get back everything you need to construct and verify a challenge.
# 1. Get the client setup and block list
curl https://<env>.pinion.build/prover/api/v1/setup?key_id=<key_id>
# 2. Derive a challenge from a fresh random seed (client-side, offline)
# → produces: challenge bytes + a local validator
# 3. Submit the challenge (no authentication needed)
curl -X POST https://<env>.pinion.build/prover/prove \
-H "Content-Type: application/json" \
-d '{"key_id": "<key_id>", "roots": [], "challenge": <base64>}'
# 4. Verify the proof locally against your validator
# → true = data intact | false = tampering detectedWe encourage you to try to catch us.
Not because we think you will — but because a proof system you can audit is more trustworthy than one you have to believe. You don’t need to trust our infrastructure, our team, or our uptime page. You need to trust the mathematics, and the mathematics doesn’t require faith.
Our storage-proofs benchmark site shows how each protocol implementation compares to theoretical predictions from the papers — detection probability curves measured against simulated corruption scenarios, timing data for setup, prove, and verify operations, and extraction success rates across varying file sizes. The curves you see there are the same formula as the graph above, measured rather than modeled.
“We didn’t invent these proofs. Ateniese, Erway, Juels, Kaliski, Shacham, Waters, Bowers, Oprea, Cash, Küpçü, Wichs, Dziembowski, Faust, Pietrzak, Fisch, and Groth did. We implemented them faithfully, tested them against the papers, and gave you the API.”
The rest is mathematics.