Storage Proofs/The Cryptography

Storage Proofs: The Cryptography

Twenty years of academic research underlies this work. This page covers the academic history, the mathematics, and how we adapted classical proof-of-storage schemes to IPFS’s content-addressed block model.

Looking for the interactive intro? Start with the warehouse game →

Why a Checksum Isn’t Enough

The “store-only-the-hash” attack

A dishonest storage provider who received your file can compute its SHA-256 hash once, store just those 32 bytes, and then delete the actual data. Every time you ask “do you have my file?” they answer correctly — because they computed the hash before deleting. They need O(1) storage to maintain the illusion of O(n) storage.

Any verification scheme based on a fixed, predictable question fails this way. The solution is to make challenges unpredictable — chosen after the data is committed — so the prover must hold the actual blocks to answer them. No amount of precomputation helps if the question changes every time.

This is why proof-of-storage schemes are interactive(or use pseudorandom challenges tied to a fresh seed): the prover can’t know in advance which blocks will be asked about, so they must keep all of them.

PDP — Provable Data Possession

Proves the storage provider holds your data at the moment of the challenge. O(1) or O(log n) proof size. Unlimited challenges. Does notenable file recovery from audit logs — it’s a detection system, not a backup.

POR — Proof of Retrievability

A strictly stronger guarantee: by accumulating enough challenge-response pairs, a third party can reconstruct the full file — even if the provider is misbehaving. Detection plus recoverability.

Key Papers

Twenty years of cryptography research from multiple independent groups underlies this work. These papers tackle different problems — some are theoretical foundations, others are practical constructions — and Pinion’s implementation draws directly from them.

PDP2007
Provable Data Possession at Untrusted Stores
Ateniese, Burns, Curtmola, Herring, Kissner, Peterson & Song
First scalable PDP — O(1) proof size, unlimited challenges, homomorphic tag aggregation
POR2007
PORs: Proofs of Retrievability for Large Files
Juels & Kaliski
First proof of retrievability — encrypted sentinels + Reed-Solomon codes enable full file recovery from challenge logs
POR2008
Compact Proofs of Retrievability
Shacham & Waters
Unlimited-challenge POR using linear algebra over Z_p; public-key variant uses bilinear pairings for third-party verification
PDP+Updates2009
Dynamic Provable Data Possession
Erway, Küpçü, Papamanthou & Tamassia
Adds cryptographically verifiable updates (insert/modify/delete) via authenticated skip lists; also supports public verification
POR2009
Proofs of Retrievability: Theory and Implementation
Bowers, Juels & Oprea
Combines SW's linear inner code with JK's RS outer code — strongest provable extraction guarantee below a corruption threshold
Dynamic POR2013
Dynamic Proofs of Retrievability via Oblivious RAM
Cash, Küpçü & Wichs
Replaces Erway's skip lists with ORAM — cleaner dynamic POR whose security reduces to well-studied ORAM assumptions rather than bespoke authenticated structures
Dynamic POR2013
Practical Dynamic Proofs of Retrievability
Shi, Stefanov & Papamanthou
Implementation-focused companion to Cash et al. — concrete constructions with measured overhead, showing dynamic POR is deployable, not just theoretically possible
PoSpace2015
Proofs of Space
Dziembowski, Faust, Kolmogorov & Pietrzak
Proves storage of random data (not a specific file) via graph pebbling — foundational for Chia's blockchain consensus; storage capacity rather than file possession
PoRep2019
PoReps: Proofs of Space on Useful Data
Fisch (Protocol Labs)
Key distinction from classical POR: the prover must be the first to encode — preventing outsourcing the encoding. Encoding binds data to a specific prover. Foundation of Filecoin's proof system.
ZK-SNARK2019–present
Winning PoSt & Window PoSt (Groth16 over PoRep)
Protocol Labs / Filecoin
Wraps Groth16 SNARKs (2016) around PoRep — the storage proof itself becomes a SNARK, enabling constant-size on-chain verification regardless of data volume

From Theory to IPFS: Blocks, Tags, and Sparse Arrays

The papers above were designed with traditional file storage in mind — fixed-size blocks, sequential integer indices. The question for Pinion is: how do you apply these proofs to IPFS, where blocks aren’t numbered integers but 256-bit content hashes arranged in a Merkle DAG?

Standard proof-of-storage blocks

The papers above describe files as flat arrays of fixed-size blocks. Block 1, block 2, block 3 — sequential integer indices. The protocols are designed around this model: you can challenge block 47, compute its tag, and the index is part of the tag computation. Simple, but rigid.

IPFS blocks

IPFS stores content differently. Data is organized as a Merkle DAG — a directed acyclic graph of content-addressed blocks, each identified by its CID (a cryptographic hash of its contents). A file is a root CID; walk the DAG and you discover every leaf block. Block identifiers aren’t sequential integers — they’re 256-bit hashes. And two different files that share content share the same block, identified by the same CID. Deduplication is structural.

To use proof-of-storage on IPFS data, we re-imagine the DAG’s blocks as entries in a very large sparse array indexed by CID bytes. Not every element is populated — the CID space is astronomically large. But for any given root, walking the DAG gives us the complete set of populated entries.

Not every proof-of-storage protocol handles sparse, arbitrary block IDs. The schemes that require sequential integer indices for their tag computation can’t be used directly. Pinion uses only protocols where the block identifier can be arbitrary bytes — fed into a PRF or hash — so that CID bytes work as naturally as sequential integers.

The Line Protocol: one interface, five implementations

Pinion wraps all five schemes behind a single Go interface called the line protocol:

Tagger    → TagBlocks(store)              → []Tag
Challenger → Challenge(blockIDs)           → (challenge, validator)
Prover    → Prove(challenge, blockStore)  → proof
Validator  → Verify(challenge, proof)     → bool

Switching protocols requires changing only the string you pass to POST /challenge-key. Everything else — how you generate challenges, submit them, and verify responses — is identical. The interface is protocol-agnostic by design.

The Seed Trick: Stateless, Unpredictable Challenges

Each audit round starts with a fresh random 32-byte seed S. Everything about the challenge — which blocks to sample, and with what coefficients — is derived deterministically from that seed using a pseudorandom function (PRF):

Seed S32 random bytes (client-generated)HMAC-SHA256(S, "indices")rank each block IDHMAC-SHA256(S, "coeffs")derive coefficientstop c → challenged block IDsν₁, ν₂, …, νc coefficients

The same seed produces the same challenge every time — so both the client (who generates the challenge) and the server (who constructs the proof) can independently compute which blocks are involved. No shared state. No coordination beyond passing the seed. Fresh seed each round means fresh, unpredictable challenge each round.

Because block IDs are arbitrary bytes, the PRF ranking works identically whether you’re challenging 3 blocks or 3 million. The IPFS sparse-array representation slots in naturally: CID bytes become the items being ranked, and the top-c selection is your challenge set.

The Math of Catching a Lie

Here’s the key theorem. If a fraction p of your blocks are sampled each round, and each round is independent, then after k rounds your confidence that no corruption has gone undetected is:

C(k) = 1 − (1 − p)k

probability of detecting corruption in at least one of k rounds, given p = fraction of blocks sampled per round

Detection probability grows exponentially — even modest sampling rates converge to certainty within dozens of rounds
90%95%0%25%50%75%100%01020304050607080audit rounds (k)P(detect corruption)
p=5% — 95% confidence in 59 roundsp=10% — 95% confidence in 29 roundsp=20% — 95% confidence in 14 roundsp=50% — 95% confidence in 5 rounds

Each round is independent — like asking about a different random page each time. The probability of missingcorruption multiplies: if you have a 10% chance of catching it each round, after 10 rounds you’ve had 10 independent shots at it, and the chance of missing all of them is only 0.910≈ 35%. After 44 rounds at 10% sampling, you’re above 99% confidence.

The exponential convergence means you don’t need to download the file to trust it. You just need to keep asking — and the math does the rest.

Choosing a Protocol

Pinion implements all five schemes. The right choice depends on what properties you need:

ProtocolTypeKey TypeProof SizeMax ChallengesVerified UpdatesPublic Verification
AteniesePDPRSA privateO(1)UnlimitedNoNo
ErwayPDP + UpdatesRSA publicO(log n)UnlimitedYesYes
SW-PrivatePORSymmetric PRFO(S) field elementsUnlimitedNoNo
SW-PublicPOREC keypair (BN256)O(S) group elementsUnlimitedNoYes (pairings)
BJOPORSymmetric PRFO(1) per roundBounded (Q sentinels)NoNo

PDP (Provable Data Possession)

High-probability detection of corruption or loss, with O(1) or O(log n) proof sizes. Does not enable file recovery from challenge logs. Best for continuous monitoring — you get strong statistical confidence with minimal overhead.

POR (Proof of Retrievability)

A stricter guarantee: not only can you detect corruption, but a third party can recover the full file by accumulating enough challenge-response pairs. If Pinion ever misbehaves, the audit log becomes a recovery mechanism.

Challenge Us

Here’s the part that distinguishes a proof system from a promise: the POST /prover/prove endpoint requires no authentication. Anyone who obtains a client_setup and a list of block_ids can issue a challenge and verify the response locally — no Pinion involvement in the verification step.

Any Pinion account holder can share their key_id with you. Call GET /prover/api/v1/setup?key_id=<id> and you get back everything you need to construct and verify a challenge.

# 1. Get the client setup and block list
curl https://<env>.pinion.build/prover/api/v1/setup?key_id=<key_id>

# 2. Derive a challenge from a fresh random seed (client-side, offline)
#    → produces: challenge bytes + a local validator

# 3. Submit the challenge (no authentication needed)
curl -X POST https://<env>.pinion.build/prover/prove \
  -H "Content-Type: application/json" \
  -d '{"key_id": "<key_id>", "roots": [], "challenge": <base64>}'

# 4. Verify the proof locally against your validator
#    → true = data intact | false = tampering detected

We encourage you to try to catch us.

Not because we think you will — but because a proof system you can audit is more trustworthy than one you have to believe. You don’t need to trust our infrastructure, our team, or our uptime page. You need to trust the mathematics, and the mathematics doesn’t require faith.

Our storage-proofs benchmark site shows how each protocol implementation compares to theoretical predictions from the papers — detection probability curves measured against simulated corruption scenarios, timing data for setup, prove, and verify operations, and extraction success rates across varying file sizes. The curves you see there are the same formula as the graph above, measured rather than modeled.

For auditors and compliance teams: The public verification endpoint and the open challenge protocol mean third-party auditors can independently verify storage without any access to your account or private keys. Challenge-response logs can satisfy SLA enforcement requirements or be submitted as evidence in dispute resolution — all cryptographically verifiable without trusting Pinion as the witness.

“We didn’t invent these proofs. Ateniese, Erway, Juels, Kaliski, Shacham, Waters, Bowers, Oprea, Cash, Küpçü, Wichs, Dziembowski, Faust, Pietrzak, Fisch, and Groth did. We implemented them faithfully, tested them against the papers, and gave you the API.”

The rest is mathematics.