How do you prove a matrix multiply
without revealing the matrices?

This interactive guide walks you through the proving stack of a real zero-knowledge system from the Field Extension Institute: how a simple claim "A × B = C" gets transformed through four elegant protocols into a compact, verifiable proof.

Freivalds' Check Multilinear Extension Sumcheck Protocol Linear Claims
The core idea: Checking a full n×n matrix multiply is O(n³). But we can probabilistically verify it in O(n²) using random challenges. Each step in this pipeline further compresses the proof until we're left with a handful of "linear claims" — simple statements that a committed polynomial evaluates to a specific value at a specific point.

How to use this guide

Each section has three parts:
The idea — what this step does and why
Interactive sandbox — play with real numbers
Knowledge check — test if you really got it

Step 1 of 4

Freivalds' Check

Turn an O(n³) matrix multiplication check into an O(n²) probabilistic one using a single random vector.

Have: matrices A, B, and a claimed C
Goal: check A×B = C without redoing the O(n³) multiplication

The Problem

A prover claims A × B = C. The naive check is to redo the multiplication — but that's exactly as expensive as computing it yourself. We want the verifier to do less work.

The Trick (one-sided)

Pick a random vector r. Compute A × (B × r) and compare it to C × r. If A × B = C, these must be equal. If not, they'll differ with overwhelming probability.

A × (B × r)  =?  C × r
Why it works: Note that A(Br) − Cr = (AB − C)r. If AB = C, this is always zero. If AB ≠ C, then some row of (AB−C) is nonzero — say it's [a1, a2, ...] with at least one ai ≠ 0. That row dotted with r gives a1r1 + a2r2 + ... — a nonzero equation in the random values. The chance that random numbers accidentally satisfy a nonzero equation is at most 1/|field|.

What's a "field"? We work over a finite field — think integers modulo a large prime p, where addition and multiplication wrap around. All the math on this page happens in such a field. Larger field = lower error probability.

Going further (two-sided)

One random vector collapses the columns. But we can also collapse the rows by multiplying on the left by a random row vector uT:

uT · A · B · v  =?  uT · C · v

Now both sides are scalars, not vectors. The full matrix equation has been compressed to a single number comparison. This two-sided version is what the real system uses — the sandbox below demonstrates both. In the next section, we'll see how polynomials give us an elegant way to generate these random vectors.

Try it: Freivalds' Check on 2×2 Matrices

Edit any matrix cell. The random vector auto-generates. Watch both sides compute.

A
×
B
=
C (claimed)

One-Sided Check: compare vectors

v (random)
A(Bv) = ?
Cv = ?
Compute to check...

Two-Sided Check: compare scalars (what the real system uses)

Dot the results above with a second random vector u — both sides collapse to a single number.

u (random)
uT · A(Bv) = ?
uT · Cv = ?
Compute to check...

Two random vectors → two dimensions collapsed → scalar comparison.

Try this: In the sandbox above, click "Compute real C = A×B" to get the correct C. Then manually edit one entry of C, changing it by just 1 (e.g., 19 → 20). Now click "New random vectors" several times. How often does Freivalds catch the error?

Before revealing the answer: does the size of the error matter? Would a bigger change be caught more reliably?

Freivalds catches it with overwhelming probability (at least 1 − 1/|field|), regardless of how small the error is. Even changing one entry by 1 makes (AB - C) a nonzero matrix. The dot product of a nonzero row with a random vector gives a nonzero result with probability ≥ 1 - 1/|field|. Error size and location are irrelevant — the random projection treats a tiny error the same as a completely wrong matrix.
Step 2 of 4

Multilinear Extension

Have: Freivalds reduces A×B=C to a scalar check using random vectors
Goal: represent matrix data as polynomials so we can use sumcheck

Freivalds gave us a cheap way to check A×B = C. But the next protocol (sumcheck) only works on polynomials, not matrices. So we need a way to turn matrix entries into a polynomial — one that preserves the matrix multiply identity.

Give every entry a binary address

The sumcheck protocol (Section 3) works by peeling off one binary variable at a time. So we need our data indexed by binary variables, not row/column numbers.

A 2×4 matrix has 8 entries. The row index is 0 or 1 (1 bit). The column index is 0–3 (2 bits). So each entry has a 3-bit binary address: (row, col1, col0). For example, row 1, column 2 (binary 10) has address (1, 1, 0).

Now think of those 3 bits as coordinates in a cube — the 8 entries sit at the 8 corners of a unit cube {0,1}3. This is sometimes called the "Boolean hypercube."

Why not just use the raw values?

We have 8 values at 8 binary corners. But sumcheck needs more than a lookup table — each round, the verifier picks a random challenge like r=0.6 and needs to know "what is the function at x0=0.6?" A bare table of values has no answer for non-binary inputs.

The fix: extend the table to a polynomial that (a) matches the table at all binary corners, and (b) can be evaluated at any input, including the verifier's random challenges.

The Multilinear Extension (MLE)

The MLE is the unique polynomial with the lowest possible degree that does this. "Multilinear" means degree ≤ 1 in each variable — no x2 terms, just products of xi and (1-xi). It's unique because 2k values determine exactly 2k coefficients in the multilinear basis.

The formula

One term per corner. Each term is: (the value at that corner) × (a selector that "picks" it):

MLE(x0,x1,x2) = f(0,0,0) · (1-x0)(1-x1)(1-x2) + f(0,0,1) · (1-x0)(1-x1)·x2 + f(0,1,0) · (1-x0)·x1·(1-x2) + f(0,1,1) · (1-x0)·x1·x2 + f(1,0,0) · x0·(1-x1)(1-x2) + f(1,0,1) · x0·(1-x1)·x2 + f(1,1,0) · x0·x1·(1-x2) + f(1,1,1) · x0·x1·x2

Green = constants (the 8 data values). Pink = selectors built from the variables x0, x1, x2. The pattern: where the address bit is 0, use (1-xi); where it's 1, use xi.

Why the selectors work: At corner (1,0,1), its selector is x0·(1-x1)·x2 = 1·1·1 = 1. Every other selector has at least one mismatched factor (e.g. f(0,0,0)'s selector has (1-x0)=0), so they vanish. At a non-binary point like (0.3, 0.7, 0.5), no selector is exactly 0 or 1 — you get a weighted blend of all 8 values.

Why this connects to Freivalds

At binary inputs, the MLE reads exact entries. At non-binary inputs, it computes weighted combinations — and those weights are exactly Freivalds' random vector.

Consider a 2-row matrix. Evaluate its MLE at x0=0.4 (the row variable). The Lagrange weights are: row 0 gets (1−0.4) = 0.6, row 1 gets 0.4.

MLE(0.4, j) = 0.6·row0[j] + 0.4·row1[j]  =  [0.6, 0.4] · M

That's multiplying by the vector u = [0.6, 0.4]. Choosing a random x0 chooses a random u — exactly what Freivalds does.

Try it: MLE with 3 variables (8 values)

Edit the 8 values, then drag sliders to evaluate the MLE at any continuous point in [0,1]3.

As a 2×4 Matrix (row = x0, col = x1x2)

Think of 3 variables as: 1 for the row index (x0), 2 for the column index (x1, x2). Edit the matrix to change the MLE.

x1=0,x2=0 x1=0,x2=1 x1=1,x2=0 x1=1,x2=1
x0=0
x0=1

Fixing x0 selects a row. Fixing x1,x2 selects a column. Fixing all 3 reads one entry.

As Hypercube Corners

Same 8 values, indexed by 3 binary variables:

x0=0: f(0,0,0)=2   f(0,0,1)=5   f(0,1,0)=3   f(0,1,1)=1
x0=1: f(1,0,0)=4   f(1,0,1)=7   f(1,1,0)=6   f(1,1,1)=8

The matrix view and hypercube view are the same data. MLE turns this into a polynomial you can evaluate anywhere in [0,1]3.

x0 = 0.00
x1 = 0.00
x2 = 0.00
MLE(0.00, 0.00, 0.00) = ?

Hypercube Evaluation Table

Each row is a vertex of {0,1}3. The weight shows how much each vertex contributes at your chosen point.

x0x1x2fLagrange weightContribution
Try this: Set all sliders to 0 or 1 — exactly one weight becomes 1.000 (you're reading a single value). Now set all to 0.50: you get the average of all 8 values. The MLE smoothly interpolates. Try setting just x0=1 and leaving the others at 0: you read f(1,0,0). This is "fixing one variable."

Try this: In the sandbox, set x0=1.00, x1=0.00, x2=0.00. Look at the Lagrange weight column — which vertex has weight 1.000? What MLE value do you get?

Now set all three sliders to 0.50 and look at the weights. What are they, and what does the MLE value equal in terms of the 8 original values?

At (1, 0, 0): only vertex (1,0,0) has weight 1.000 — you're reading a single value directly. At (0.5, 0.5, 0.5): every vertex has weight 1/8 = 0.125 — the MLE value is the average of all 8 values. This is the key insight: MLE evaluation computes a weighted combination where the Lagrange weights act as the coefficients. At binary points, one weight is 1 and the rest are 0 (lookup). At non-binary points, you get a blend — and those blend weights are exactly the "random vector" from Freivalds.

Connecting the Pieces: From Freivalds to a Polynomial Sum

We just saw that MLE evaluation at a random point is the same as multiplying by a random vector. The two-sided Freivalds check from Section 1 uses two random vectors (u for rows, v for columns) to collapse A×B = C to a scalar: uT·A·B·v = uT·C·v. With MLEs, we do exactly this — but the "random vectors" come from Lagrange weights at random evaluation points.

The question is: what does this look like for a larger matrix where the random vectors have more than 2 entries?

Scaling up: 4×4 matrix

A 4×4 matrix has 16 entries and its MLE has 4 variables: x0, x1 for the row (4 rows = 22) and x2, x3 for the column (4 cols = 22).

To "collapse the rows," we pick two random field elements crow = (r0, r1) and fix x0=r0, x1=r1. The Lagrange weights at (r0, r1) give the 4-element "random vector" u:

Try it: drag the sliders to see how different (r0, r1) produce different "random vectors":

r0 = 0.30
r1 = 0.70
u = [0.21, 0.49, 0.09, 0.21]

Row 0 gets 21%, row 1 gets 49%, row 2 gets 9%, row 3 gets 21%

Similarly, ccol = (r2, r3) fixes the column variables, producing another 4-element "random vector" v. Both dimensions are now pinned — exactly like uT·M·v, but expressed as MLE evaluation.

After pinning the row and column dimensions, the inner dimension variables remain free. But why does a sum over the inner dimension appear? Recall how matrix multiplication works:

C[i, j] = ∑k A[i, k] · B[k, j]     (definition of matrix multiply)

The MLE of C inherits this identity: for any row/column values,

MLEC(xrow, xcol) = ∑k∈{0,1}m MLEA(xrow, k) · MLEB(k, xcol)

This holds because both sides are multilinear polynomials that agree at every binary input (where they reduce to the matrix multiply definition). By uniqueness of the MLE, they must be equal everywhere.

Now fix xrow = crow and xcol = ccol. What remains is a sum over the inner dimension:

j∈{0,1}k MLEA(crow, j) · MLEB(j, ccol)  =  MLEC(crow, ccol)

This is a dot product over the inner dimension — exactly what sumcheck proves.

Wait — isn't this sum cheap to just compute directly? For n×n matrices, this sum has n terms. But getting here wasn't free: pinning the row and column dimensions required computing Lagrange-weighted combinations across all n² matrix entries. So the prover still does O(n²) work overall. The question is: why bother with this sum formulation instead of just doing Freivalds directly? The answer comes in the next section — the sumcheck protocol lets a verifier check this sum using only a handful of small messages, without touching the matrices at all. That asymmetry (expensive prover, cheap verifier) is the whole point.

(In the implementation, crow and ccol are called c1 and c3 — c2 comes later as the sumcheck challenges for the inner dimension variables. In the implementation, pinning is done by scalar_mle_fix_variable, applied once per variable.)

Step 3 of 4

The Sumcheck Protocol

Prove that a polynomial sums to a claimed value over the Boolean hypercube, one variable at a time, using only small messages.

The Claim

Have: a sum ∑ MLEA·MLEB = MLEC over 2k terms
Goal: replace that exponential-size sum with a logarithmic-length transcript

From the previous section, Freivalds + MLE gave us:

j∈{0,1}k MLEA(crow,j) · MLEB(j,ccol)  =  MLEC(crow,ccol)

The right side is a single scalar (already computed). The left side sums 2k terms — one for each binary assignment to the inner dimension j. Checking all terms is expensive. Sumcheck lets the verifier check using only k rounds.

How It Works

Call the left side of the claim f(j) = MLEA(crow, j) · MLEB(j, ccol). We need to prove j f(j) = S where S = MLEC(crow, ccol). The sum has 2k terms. The idea: peel off one variable of j at a time.

Round 1: Instead of just asserting "the sum is S," the prover shows how the sum breaks down across the first variable j0. The prover sends a polynomial p1(X) — computed by summing f over all remaining j-variables, leaving j0 free as X. The verifier checks: p1(0) + p1(1) = S. This must hold because summing over j0 ∈ {0,1} means adding the X=0 and X=1 cases.

The random challenge: The verifier now knows that S = p1(0) + p1(1), but doesn't know if p1 itself is correct. It could ask the prover to justify both p1(0) and p1(1) as separate sub-claims — but that creates two problems to verify instead of one, doubling the work each round. Instead, the verifier picks a random r0 and asks: "justify p1(r0)." This collapses both halves into a single sub-claim. If the prover sent a fake polynomial (one rigged so p(0)+p(1) = S but otherwise wrong), the value at a random point will almost certainly be wrong too — the lie can't hide at a random spot.

Rounds 2 through k: Same pattern. Each round, the prover sends a polynomial showing how the sum breaks down across the next j-variable. The verifier checks consistency with the previous round's carry-forward, then picks a new random challenge. Each round pins one more variable of j.

After k rounds: Every j-variable has been pinned to a random value (r0, r1, …, rk−1). The last round's polynomial was evaluated at rk−1, giving a single number. To finish, the verifier must check that this number actually equals f(r0, r1, …, rk−1) — the function at the fully-pinned point. But f = MLEA · MLEB, so the verifier needs both MLEA(crow, r0, r1, …) and MLEB(r0, r1, …, ccol) to multiply them and compare. The problem: these MLEs are private — the verifier doesn't have the matrices. So the prover claims values for each, and justifying those claims is the job of the next section.

Degree Matters

Each term in the sum is MLEA × MLEB — a product of two multilinear polynomials. Since each is degree 1 in j, the product is degree 2.

In the real system, a third polynomial called the mask MLE is also multiplied in. It's simply [1,1,...,1, 0,...,0] — it selects only "real" rows and zeroes out some extra rows that exist for cryptographic reasons (explained in the next section — for now, just know it adds one more degree). With the mask, the total degree per variable becomes 3.

This means each round, the prover sends 4 numbers (coefficients of a degree-3 polynomial) to the verifier. That's it — regardless of matrix size.

Fiat-Shamir: In practice there's no live verifier sending random challenges. Instead, each challenge is computed by hashing all the public messages so far — commitments, then round 1's polynomial, then round 2's, etc. (This is the Fiat-Shamir transform.) The "so far" is critical: each challenge includes the round polynomial the prover just sent. If all challenges were derived from the commitments alone, the prover would know every ri upfront. For each round, they could construct a separate fake polynomial: they know what p(0) + p(1) must equal (the target from the previous round) and what p(ri) must chain to (the next round's target) — just define the polynomial at those points and interpolate. One fake polynomial per round, bridging from a false sum down to the true final value. Every check passes. By including each round's polynomial in the hash, the prover can't see ri until after sending round i's polynomial, so they can't construct these fakes.
At scale: For a 4096×4096 matmul, the Freivalds/MLE step (Section 2) already pinned the row and column dimensions using crow and ccol. Sumcheck only runs over the inner dimension, but the mask doubles the dimension (h2 real entries + h2 padding zeros), adding one variable. That's log2(4096) + 1 = 13 rounds, each sending 4 evaluation points (degree 3 from the mask). Total sumcheck proof: 52 field elements (13 × 4), regardless of whether the matrices have 16 million entries.
Sumcheck complexity (general): Sumcheck over k variables (hypercube size n = 2k) with max degree d per variable:
Prover work: O(d · n). The prover maintains an evaluation table that halves each round: n + n/2 + n/4 + … = 2n, times d+1 evaluation points per round.
Proof size: O(k · d) field elements — one degree-d polynomial (d+1 values) per round, for k rounds.
Verifier work: O(k · d) — check d+1 values per round, k rounds.

Why is degree a multiplier, not an exponent? You might expect degree-d to mean nd work — like symbolically expanding a product of d polynomials with n terms each. But the prover never does symbolic expansion. Instead, it keeps a separate evaluation table for each factor (e.g. one table for A′, one for B′, one for mask). To evaluate f = A′ · B′ · mask at a point, the prover looks up each factor and multiplies — that's O(d) per entry, not O(nd). The degree only determines how many points to evaluate at (d+1 per round), which is why it enters as a constant factor. (Same idea as NTT/FFT: working in evaluation form makes polynomial multiplication pointwise and cheap, whereas expanding symbolic coefficients would be quadratic or worse.)

In the matmul case: n = inner dimension, d = 3, giving ~52 proof field elements and ~8n prover operations for the sumcheck portion.

Try it: Sumcheck on a 3-variable polynomial

In the protocol above, f(j) = MLEA(crow, j) · MLEB(j, ccol) and the prover must show ∑ f(j) = S. Here we use a simpler f with 8 values on {0,1}3 to demonstrate the mechanics. The protocol proves their sum in 3 rounds, one variable at a time. Watch the values reduce: 8 → 4 → 2 → 1.

Notice: the sandbox shows f as a table of 8 values at binary points, not a symbolic expression. This is evaluation form — the same representation the real prover uses (as discussed above). But how does the prover evaluate at non-binary points like r0=0.4? It uses the multilinear property: since f is degree 1 in each variable, interpolating between the x0=0 and x0=1 entries is a simple weighted average — f(0.4, b1, b2) = 0.6 · f(0, b1, b2) + 0.4 · f(1, b1, b2). That's what the "Reduce" step in each round does: pairwise interpolation, O(1) per entry, halving the table. For the degree-3 case (f = A′ · B′ · mask), the prover keeps separate tables for each factor — each one is still multilinear, so each gets reduced independently by cheap pairwise interpolation. The factors are only multiplied together pointwise when computing the round polynomial's evaluation points.

The sandbox uses small decimals (r=0.17) for readability. In the real protocol, challenges are large integers in the finite field (mod a prime q ≈ 231) — something like r=2384729384729. The interpolation formula is the same: (1−r) · f(0) + r · f(1), just computed with modular arithmetic instead of floats.

This sandbox's f is degree 1 per variable (round polynomials are linear, 2 coefficients each). The actual matmul f = MLEA · MLEB · mask is degree 3 (round polynomials are cubic, 4 coefficients each). Same protocol — the degree-3 worked example below the sandbox shows what that looks like.

True sum over {0,1}³ = ?
What sum will you claim?

What a degree-3 round actually looks like

The sandbox above uses a simple sum (degree 1). In the real matmul proof, each round's polynomial comes from a product of three MLEs: A'(j) · B'(j) · mask(j). Here's how the prover computes one round — using evaluation form, not symbolic expansion:

The prover has three evaluation tables. For the current variable X: A': A'(0) = 3 A'(1) = 5 A'(2) = 7 A'(3) = 9 B': B'(0) = 5 B'(1) = 6 B'(2) = 7 B'(3) = 8 mask: m(0) = 1 m(1) = 0 m(2) = -1 m(3) = -2 (mask = [1, 0] at binary points, MLE is 1-X, evaluated beyond {0,1}) Multiply pointwise at each X: p(0) = 3 · 5 · 1 = 15 p(1) = 5 · 6 · 0 = 0 p(2) = 7 · 7 · -1 = -49 p(3) = 9 · 8 · -2 = -144 4 values sent to verifier: [15, 0, -49, -144]

Check: p(0) + p(1) = 15 + 0 = 15 = claimed sum. The verifier reconstructs the degree-3 polynomial from these 4 points, picks a random r, evaluates p(r), and uses it as the target for the next round. No symbolic expansion needed — just table lookups and pointwise multiplication.

Try this: In the sandbox above, change the claimed sum to something false (e.g. 46), click Start, and try to get through all 3 rounds. Try two strategies: (1) rig round 1, then go honest. (2) rig every round. What happens in each case?

Strategy 1 (rig then honest): You must rig round 1 to match the false claim. But when you go honest in round 2, the real data's sum doesn't match the inflated target from the rigged round 1. Caught at the consistency check.

Strategy 2 (rig every round): You can craft fake polynomials that pass every consistency check. But at the claim opening, the fake chain's final value doesn't match the committed polynomial's real evaluation. You'd need to swap in a different polynomial — but the commitment binds you to the real one.

Both paths are dead ends. The protocol has no gap a cheater can exploit.
Step 4 of 4

Linear Claims & Folding

The universal primitive: everything reduces to "this committed polynomial evaluates to this value at this point."

What's a Linear Claim?

Have: evaluation claims about MLEA, MLEB, MLEC at specific points + commitments
Goal: compress thousands of claims into one verifiable check

Sumcheck left us with claims that MLEA and MLEB evaluate to specific values at specific points. How does the verifier check those without seeing the polynomials?

Before the protocol starts, the prover publishes a commitment to each polynomial. The commitment is binding (the prover can't later swap in a different polynomial) and hiding (it reveals nothing about the polynomial's contents). Now each claim has structure:

commit(f) = cm  ∧  MLE(f)(r) = val

The commitment cm is public (locked in before any challenges were derived). The polynomial f is private. This is the atomic unit of proof.

Ajtai Commitments (simplified)

Compute commit(f) = A · f mod q, where A is a large public random matrix, q is a prime, and f must have small bounded entries (e.g. INT8 matrix values in [-128, 127]).

Binding: Since A has far more columns than rows, infinitely many vectors map to the same commitment — finding some f′ with A · f′ = A · f mod q is trivially easy (underdetermined linear system, solve with Gaussian elimination). But those solutions have huge entries — random values mod q. A valid f′ needs small entries (it represents real matrix data, bounded integers). Finding two different short-entry vectors with the same commitment is the Shortest Integer Solution (SIS) problem — believed to be computationally hard for random lattices. Solutions with large entries are everywhere; solutions with small entries are nearly impossible to find.

Hiding: Without blinding, small matrix entries make commitments vulnerable in two ways:

Guessing: INT8 entries have only 256 possible values each. Like an unsalted hash, an attacker could enumerate plausible f vectors and check which one produces the given commitment. Blinding rows act like a salt — they add high-entropy randomness that makes enumeration infeasible.
Solving: The same matrix A is used for all commitments. Without blinding, seeing multiple commitments (A·f1, A·f2, …) with known-small entries gives a system of equations an attacker could try to solve. Blinding rows add independent randomness to each commitment, breaking this.

The fix: before committing, the prover appends ~2240 random entries to f itself. The committed vector is now [real data | random blinding], and the randomness dominates. The mask MLE from Section 3 zeroes out the blinding entries during sumcheck so they don't affect the proved computation.

Additive: A · (f + g) = A · f + A · g — matrix multiplication distributes. So commit(f + g) = commit(f) + commit(g). This is what makes folding possible.

How to check a single claim (conceptual)

In the simplest case, the prover reveals f (the polynomial with its blinding entries). The verifier does two checks:

Commitment check: compute A · f mod q and confirm it matches the published cm. If the prover tried to swap in a different f′, it would need small entries AND the same commitment — which is the SIS problem we just described.
Evaluation check: evaluate MLE(f)(r) and confirm it equals the claimed val.

The prover can't pick an f designed to pass at a convenient evaluation point, because the evaluation point r is determined by earlier transcript challenges (Fiat-Shamir): change the polynomial → change the commitment → change the hash → change r. You can't choose a polynomial to match a point that depends on the polynomial you chose.

The problem with revealing

Checking one claim this way is sound — but revealing f leaks the matrix data (destroying zero-knowledge), and with thousands of claims the proof becomes enormous.

Folding: Many Claims → One

The solution to both: fold many claims into one, so only a single folded combination is ever revealed — hiding all individual polynomials and keeping the proof small.

SymphonyFold compresses many claims into one using a random linear combination. The commitment's additive homomorphism makes this work:

commit(f + α·g) = commit(f) + α·commit(g)

The verifier picks random weights and combines commitments, points, and values. If any original claim was false, the combined claim is false with overwhelming probability — the false claim would need to perfectly cancel in the random combination, which requires guessing α (probability 1/|field|).

The binding property survives folding: the verifier computes the folded commitment itself (cmfolded = cm1 + α·cm2 + …) from the original public commitments, so the same binding guarantee applies to the combined polynomial. SymphonyFold does this in batches of 32.

Different evaluation points?

There's a subtlety. The folded commitment is cm1 + α·cm2 — a linear combination. For this to be a valid claim, the folded value must be the same linear combination of evaluations: f(r*) + α·g(r*). But that only works if both polynomials are evaluated at the same point r*.

The original claims have different points — claim 1 knows f(0.3) = 5 and claim 2 knows h(0.7) = 9 (where f and h are different committed polynomials). How do we get both evaluated at a common r*?

Point alignment: sumcheck on a line

For each polynomial, the prover has the old claim (e.g. f(0.3) = 5) and needs to provide a new claim at the common point (f(r*) = ?). The trick: turn this into a sumcheck problem.

Restrict f to the line between the old point and the new point: g(t) = f(0.3·(1−t) + r*·t). Now g(0) = f(0.3) = 5 (known) and g(1) = f(r*) (what we need). The two evaluation points {0, 1} form a 1-dimensional hypercube — the lowest dimension possible — and you already know how to verify a sum over a hypercube.

The prover sends g (a small univariate polynomial — low-degree since f is multilinear). The verifier reads off g(0) and g(1) from it:

• Checks g(0) = 5 — must match the known claim from sumcheck
• Reads g(1) as f(r*) — the new evaluation needed for folding
• Picks random t. The prover evaluates g(t) and provides it as a fresh linear claim (f at a random point on the line through the old and new evaluation points)

If the prover lied about g (made it pass the g(0)=5 check but with a wrong g(1)), the random evaluation g(t) will almost certainly be wrong — same principle as sumcheck. It's a 1-round sumcheck on a 1-dimensional hypercube.

After doing this for each polynomial, all claims are at the common point r* and folding proceeds normally:

commit: cmf + α·cmh   linear combo of commitments point: r*   one shared point value: f(r*) + α·h(r*)   same linear combo of evaluations

Note: per-polynomial line-restrictions produce fresh claims as overhead (one per polynomial). The actual SymphonyFold avoids this by processing all claims in a single "unifying sumcheck" that resolves alignment and folding in one pass. The line trick above captures the core idea; the batching is what makes it efficient at scale. The toy sandbox below only demonstrates the commitment homomorphism.

Try this: In the sandbox, try all three buttons: (1) "Fold" — honest, everything passes. (2) "Tamper with polynomial" — changes f2 without updating the commitment. Which check catches it? (3) "Lie about value" — keeps the correct polynomial but claims a wrong evaluation value. Which check catches it?

Tampering with the polynomial is caught by the commitment check — the recomputed commitment doesn't match. Lying about the value is caught by the evaluation check — the polynomial evaluates to a different value than what was claimed. Two different attacks, two different checks, both needed.

The final check: how it all confirms the result

After all folding, one claim remains. The verifier has independently computed two things from public data (commitments + Fiat-Shamir random weights):

• The expected commitment: cmfolded = cm1 + α1·cm2 + α2·cm3 + …
• The expected value: vfolded = v1 + α1·v2 + α2·v3 + … (same weights, same linear combination, at the shared point r*)

Now the prover reveals the folded polynomial ffolded. The verifier checks:

commit(ffolded) = cmfolded? — confirms the revealed polynomial matches the committed matrices
MLE(ffolded)(r*) = vfolded? — confirms the evaluation is correct

If both pass, the verifier is convinced: the original matrices A, B, C satisfied A×B = C. The chain holds because each step was locked in by the previous one — commitments determined the Freivalds challenges, which determined the sumcheck challenges, which determined the evaluation points, which determined the folding weights. This is the only moment polynomial data is revealed, and it's a folded combination — not any original matrix.

Note: the actual SymphonyFold verification is more involved than this sketch. After the unifying sumcheck aligns evaluation points, there are additional steps (a monomial/projection proof and an auxiliary witness check) before the final claim is verified. The final claim is on a helper commitment derived during folding, not directly on the original cmA/cmB/cmC. The conceptual picture (many claims compress to one via homomorphism + random challenges) is accurate, but the intermediate machinery is more complex than "just add commitments."

Try it: Claim Folding with Toy Commitments

Toy additive "commitments" (sum mod 97). Two claims at the same evaluation point (after point-alignment) fold into one via random α.

Claim 1

Polynomial f1: [3, 7, 2, 5]
Commitment: 17 (sum mod 97)
Eval point r*: (0.3, 0.7)
Value f1(r*): ?

Claim 2

Polynomial f2: [1, 9, 4, 6]
Commitment: 20 (sum mod 97)
Eval point r*: (0.3, 0.7)
Value f2(r*): ?
Putting It All Together

End-to-End: Trace a Real Matmul Proof

Follow concrete numbers through every step of the pipeline for a 2×2 matmul.

Live Pipeline Tracer

Click "Next Step" to advance. Each step shows real computed values.

Step 0 / 6
What makes this zero-knowledge? The verifier sees: commitments (opaque lattice vectors), Fiat-Shamir challenges (deterministic from public data), sumcheck round polynomials (small univariates), and evaluation claims. The actual matrix entries, model weights, and intermediate activations are never revealed.

Try this: Step through all 6 pipeline steps. As you go, make a list of everything the verifier receives. Then ask yourself: from that list, could you recover any entry of matrix A?

At minimum: commitments, sumcheck round polynomials, evaluation claims, and folding data. The actual proof transcript is larger than this toy example suggests — SymphonyFold adds helper commitments, a unifying sumcheck transcript, projection data, and an auxiliary witness — but it's still a compact set of algebraic objects, not matrix entries. No entry of A, B, or C is ever revealed. The final revealed polynomial is a folded combination — you can't disentangle the individual matrices from it.

What's Not Covered Here

This guide covers the core matmul proving flow. Important topics omitted for clarity:

Ajtai commitment internals — lattice-based hash using ring polynomial arithmetic (Zq[x])

Limb decomposition — large values split into smaller "limbs" to stay within commitment bounds (_linear_comb_limbs)

Batch processingadd_matmul(As, B, Cs, ...) processes multiple activations with a shared weight matrix

Ring vs field element pathsreduce_to_linear_felt (field, degree 3) vs reduce_to_linear_relt (ring, degree 4)

Other operation proofs — SoftmaxClaim, LookupClaim, TranslationClaim, HadamardClaim (same linear-claim output)

Multi-GPU distribution — NCCL P2P dispatch to worker GPUs, streaming_worker eager folding