The specification chapter promised that all the difficult floating-point
machinery lives at the mesh's edge, and that inside the mesh, arithmetic
is plain integers. This chapter starts making the edge honest. The full
story of the boundary — converting standard floats into blocks and back,
with rounding — needs a bit-level model of the float format and comes
later. What we can nail down now is the part with no rounding in it at
all: once numbers are mantissas on a shared scale, the hardware's way
of adding them — eight at a time, in a tree, four trees to a block —
computes exactly the sum a mathematician would write. Not
approximately: exactly, for every input.
That claim sounds too obvious to prove until you remember what it is
really saying: reordering and regrouping additions is harmless. For
integers it is; for floating point it famously is not (float addition
is not associative, which is why summing the same numbers in a different
order gives a different answer — and why reproducibility is so hard for
conventional accelerators). The mosaic's determinism story leans on
integer arithmetic precisely here, so here is where the theorem belongs.
namespaceTilessa/-- The sum f 0 + f 1 + ⋯ + f (n-1), taken in order. -/defisum:Nat→(Nat→Int)→Int|0,_=>0|n+1,f=>isumnf+fn/-- Splitting a sum at any point changes nothing. -/theoremisum_shift(mn:Nat)(f:Nat→Int):isum(m+n)f=isummf+isumn(funk=>f(m+k)):=m:Natn:Natf:Nat→Int⊢ isum(m+n)f=isummf+isumnfunk=>f(m+k)inductionnwithm:Natf:Nat→Int⊢ isum(m+0)f=isummf+isum0funk=>f(m+k)All goals completed! 🐙m:Natf:Nat→Intn:Natih:isum(m+n)f=isummf+isumnfunk=>f(m+k)⊢ isum(m+(n+1))f=isummf+isum(n+1)funk=>f(m+k)m:Natf:Nat→Intn:Natih:isum(m+n)f=isummf+isumnfunk=>f(m+k)⊢ (isummf+isumnfunk=>f(m+k))+f(m+n)=isummf+((isumnfunk=>f(m+k))+f(m+n));All goals completed! 🐙
The hardware sums a block of 32 products as four trees of eight, then
combines the four partial results. The theorem: that regrouping is the
identity.
A block-float dot product multiplies mantissas and fixes up the shared
scales once at the end. The justification is that scales distribute out
of the whole sum — again exactly, because everything is integers. We
state it for arbitrary scale factors s and t (in use they are powers
of two, but the theorem does not care):
Together: a block dot product done the silicon's way — integer products,
four trees of eight, one scale fix-up — equals the exact mathematical
answer, every time, independent of order, wiring, or timing. This is the
sentence "Tilessa's math is reproducible" turned into two theorems.
The previous chapter's mosaic carried one word as a proof of life. Here
is the real daily work: a streamed dot product. One element plays the
boundary's role, streaming mantissas eastward; its neighbour holds
weights in its scratchpad and multiply-accumulates each arriving value —
recv, ldw, mac, four times over — then reads out the accumulator.
Two programs, two machines, a stream of words across mailboxes, and an
answer that blockSum_exact and scaled_dot say is the only answer
possible. When the boundary converters arrive in full — floats in,
blocks out — everything below the conversion is already spoken for.