Pinion

Why a Checksum Isn’t Enough

The “store-only-the-hash” attack

A dishonest storage provider who received your file can compute its SHA-256 hash once, store just those 32 bytes, and then delete the actual data. Every time you ask “do you have my file?” they answer correctly — because they computed the hash before deleting. They need O(1) storage to maintain the illusion of O(n) storage.

Any verification scheme based on a fixed, predictable question fails this way. The solution is to make challenges unpredictable — chosen after the data is committed — so the prover must hold the actual blocks to answer them. No amount of precomputation helps if the question changes every time.

This is why proof-of-storage schemes are interactive(or use pseudorandom challenges tied to a fresh seed): the prover can’t know in advance which blocks will be asked about, so they must keep all of them.

PDP — Provable Data Possession

Proves the storage provider holds your data at the moment of the challenge. O(1) or O(log n) proof size. Unlimited challenges. Does notenable file recovery from audit logs — it’s a detection system, not a backup.

POR — Proof of Retrievability

A strictly stronger guarantee: by accumulating enough challenge-response pairs, a third party can reconstruct the full file — even if the provider is misbehaving. Detection plus recoverability.

Key Papers

Twenty years of cryptography research from multiple independent groups underlies this work. These papers tackle different problems — some are theoretical foundations, others are practical constructions — and Pinion’s implementation draws directly from them.

PDP2007

Provable Data Possession at Untrusted Stores

Ateniese, Burns, Curtmola, Herring, Kissner, Peterson & Song

First scalable PDP — O(1) proof size, unlimited challenges, homomorphic tag aggregation

POR2007

PORs: Proofs of Retrievability for Large Files

Juels & Kaliski

First proof of retrievability — encrypted sentinels + Reed-Solomon codes enable full file recovery from challenge logs

POR2008

Compact Proofs of Retrievability

Shacham & Waters

Unlimited-challenge POR using linear algebra over Z_p; public-key variant uses bilinear pairings for third-party verification

PDP+Updates2009

Dynamic Provable Data Possession

Erway, Küpçü, Papamanthou & Tamassia

Adds cryptographically verifiable updates (insert/modify/delete) via authenticated skip lists; also supports public verification

POR2009

Proofs of Retrievability: Theory and Implementation

Bowers, Juels & Oprea

Combines SW's linear inner code with JK's RS outer code — strongest provable extraction guarantee below a corruption threshold

Dynamic POR2013

Dynamic Proofs of Retrievability via Oblivious RAM

Cash, Küpçü & Wichs

Replaces Erway's skip lists with ORAM — cleaner dynamic POR whose security reduces to well-studied ORAM assumptions rather than bespoke authenticated structures

Dynamic POR2013

Practical Dynamic Proofs of Retrievability

Shi, Stefanov & Papamanthou

Implementation-focused companion to Cash et al. — concrete constructions with measured overhead, showing dynamic POR is deployable, not just theoretically possible

PoSpace2015

Proofs of Space

Dziembowski, Faust, Kolmogorov & Pietrzak

Proves storage of random data (not a specific file) via graph pebbling — foundational for Chia's blockchain consensus; storage capacity rather than file possession

PoRep2019

PoReps: Proofs of Space on Useful Data

Fisch (Protocol Labs)

Key distinction from classical POR: the prover must be the first to encode — preventing outsourcing the encoding. Encoding binds data to a specific prover. Foundation of Filecoin's proof system.

ZK-SNARK2019–present

Winning PoSt & Window PoSt (Groth16 over PoRep)

Protocol Labs / Filecoin

Wraps Groth16 SNARKs (2016) around PoRep — the storage proof itself becomes a SNARK, enabling constant-size on-chain verification regardless of data volume

From Theory to IPFS: Blocks, Tags, and Sparse Arrays

The papers above were designed with traditional file storage in mind — fixed-size blocks, sequential integer indices. The question for Pinion is: how do you apply these proofs to IPFS, where blocks aren’t numbered integers but 256-bit content hashes arranged in a Merkle DAG?

Standard proof-of-storage blocks

The papers above describe files as flat arrays of fixed-size blocks. Block 1, block 2, block 3 — sequential integer indices. The protocols are designed around this model: you can challenge block 47, compute its tag, and the index is part of the tag computation. Simple, but rigid.

IPFS blocks

IPFS stores content differently. Data is organized as a Merkle DAG — a directed acyclic graph of content-addressed blocks, each identified by its CID (a cryptographic hash of its contents). A file is a root CID; walk the DAG and you discover every leaf block. Block identifiers aren’t sequential integers — they’re 256-bit hashes. And two different files that share content share the same block, identified by the same CID. Deduplication is structural.

To use proof-of-storage on IPFS data, we re-imagine the DAG’s blocks as entries in a very large sparse array indexed by CID bytes. Not every element is populated — the CID space is astronomically large. But for any given root, walking the DAG gives us the complete set of populated entries.

Not every proof-of-storage protocol handles sparse, arbitrary block IDs. The schemes that require sequential integer indices for their tag computation can’t be used directly. Pinion uses only protocols where the block identifier can be arbitrary bytes — fed into a PRF or hash — so that CID bytes work as naturally as sequential integers.

The Line Protocol: one interface, five implementations

Pinion wraps all five schemes behind a single Go interface called the line protocol:

Tagger    → TagBlocks(store)              → []Tag
Challenger → Challenge(blockIDs)           → (challenge, validator)
Prover    → Prove(challenge, blockStore)  → proof
Validator  → Verify(challenge, proof)     → bool

Switching protocols requires changing only the string you pass to POST /challenge-key. Everything else — how you generate challenges, submit them, and verify responses — is identical. The interface is protocol-agnostic by design.

The Seed Trick: Stateless, Unpredictable Challenges

Each audit round starts with a fresh random 32-byte seed S. Everything about the challenge — which blocks to sample, and with what coefficients — is derived deterministically from that seed using a pseudorandom function (PRF):

The same seed produces the same challenge every time — so both the client (who generates the challenge) and the server (who constructs the proof) can independently compute which blocks are involved. No shared state. No coordination beyond passing the seed. Fresh seed each round means fresh, unpredictable challenge each round.

Because block IDs are arbitrary bytes, the PRF ranking works identically whether you’re challenging 3 blocks or 3 million. The IPFS sparse-array representation slots in naturally: CID bytes become the items being ranked, and the top-c selection is your challenge set.

The Math of Catching a Lie

Here’s the key theorem. If a fraction p of your blocks are sampled each round, and each round is independent, then after k rounds your confidence that no corruption has gone undetected is:

C(k) = 1 − (1 − p)^k

probability of detecting corruption in at least one of k rounds, given p = fraction of blocks sampled per round

Detection probability grows exponentially — even modest sampling rates converge to certainty within dozens of rounds

p=5% — 95% confidence in 59 roundsp=10% — 95% confidence in 29 roundsp=20% — 95% confidence in 14 roundsp=50% — 95% confidence in 5 rounds

Each round is independent — like asking about a different random page each time. The probability of missingcorruption multiplies: if you have a 10% chance of catching it each round, after 10 rounds you’ve had 10 independent shots at it, and the chance of missing all of them is only 0.9¹⁰≈ 35%. After 44 rounds at 10% sampling, you’re above 99% confidence.

The exponential convergence means you don’t need to download the file to trust it. You just need to keep asking — and the math does the rest.

Choosing a Protocol

Pinion implements all five schemes. The right choice depends on what properties you need:

Protocol	Type	Key Type	Proof Size	Max Challenges	Verified Updates	Public Verification
Ateniese	PDP	RSA private	O(1)	Unlimited	No	No
Erway	PDP + Updates	RSA public	O(log n)	Unlimited	Yes	Yes
SW-Private	POR	Symmetric PRF	O(S) field elements	Unlimited	No	No
SW-Public	POR	EC keypair (BN256)	O(S) group elements	Unlimited	No	Yes (pairings)
BJO	POR	Symmetric PRF	O(1) per round	Bounded (Q sentinels)	No	No

PDP (Provable Data Possession)

High-probability detection of corruption or loss, with O(1) or O(log n) proof sizes. Does not enable file recovery from challenge logs. Best for continuous monitoring — you get strong statistical confidence with minimal overhead.

POR (Proof of Retrievability)

A stricter guarantee: not only can you detect corruption, but a third party can recover the full file by accumulating enough challenge-response pairs. If Pinion ever misbehaves, the audit log becomes a recovery mechanism.

Challenge Us

Here’s the part that distinguishes a proof system from a promise: the POST /prover/prove endpoint requires no authentication. Anyone who obtains a client_setup and a list of block_ids can issue a challenge and verify the response locally — no Pinion involvement in the verification step.

Any Pinion account holder can share their key_id with you. Call GET /prover/api/v1/setup?key_id=<id> and you get back everything you need to construct and verify a challenge.

# 1. Get the client setup and block list
curl https://<env>.pinion.build/prover/api/v1/setup?key_id=<key_id>

# 2. Derive a challenge from a fresh random seed (client-side, offline)
#    → produces: challenge bytes + a local validator

# 3. Submit the challenge (no authentication needed)
curl -X POST https://<env>.pinion.build/prover/prove \
  -H "Content-Type: application/json" \
  -d '{"key_id": "<key_id>", "roots": [], "challenge": <base64>}'

# 4. Verify the proof locally against your validator
#    → true = data intact | false = tampering detected

We encourage you to try to catch us.

Not because we think you will — but because a proof system you can audit is more trustworthy than one you have to believe. You don’t need to trust our infrastructure, our team, or our uptime page. You need to trust the mathematics, and the mathematics doesn’t require faith.

Our storage-proofs benchmark site shows how each protocol implementation compares to theoretical predictions from the papers — detection probability curves measured against simulated corruption scenarios, timing data for setup, prove, and verify operations, and extraction success rates across varying file sizes. The curves you see there are the same formula as the graph above, measured rather than modeled.

For auditors and compliance teams: The public verification endpoint and the open challenge protocol mean third-party auditors can independently verify storage without any access to your account or private keys. Challenge-response logs can satisfy SLA enforcement requirements or be submitted as evidence in dispute resolution — all cryptographically verifiable without trusting Pinion as the witness.

Get started with the Prover API →Open Dashboard →

“We didn’t invent these proofs. Ateniese, Erway, Juels, Kaliski, Shacham, Waters, Bowers, Oprea, Cash, Küpçü, Wichs, Dziembowski, Faust, Pietrzak, Fisch, and Groth did. We implemented them faithfully, tested them against the papers, and gave you the API.”
The rest is mathematics.

Storage Proofs: The Cryptography