Tilessa

5. The Smart Edge🔗

The specification chapter promised that all the difficult floating-point machinery lives at the mesh's edge, and that inside the mesh, arithmetic is plain integers. This chapter starts making the edge honest. The full story of the boundary — converting standard floats into blocks and back, with rounding — needs a bit-level model of the float format and comes later. What we can nail down now is the part with no rounding in it at all: once numbers are mantissas on a shared scale, the hardware's way of adding them — eight at a time, in a tree, four trees to a block — computes exactly the sum a mathematician would write. Not approximately: exactly, for every input.

That claim sounds too obvious to prove until you remember what it is really saying: reordering and regrouping additions is harmless. For integers it is; for floating point it famously is not (float addition is not associative, which is why summing the same numbers in a different order gives a different answer — and why reproducibility is so hard for conventional accelerators). The mosaic's determinism story leans on integer arithmetic precisely here, so here is where the theorem belongs.

5.1. Sums, the way hardware takes them🔗

namespace Tilessa /-- The sum f 0 + f 1 + ⋯ + f (n-1), taken in order. -/ def isum : Nat (Nat Int) Int | 0, _ => 0 | n + 1, f => isum n f + f n /-- Splitting a sum at any point changes nothing. -/ theorem isum_shift (m n : Nat) (f : Nat Int) : isum (m + n) f = isum m f + isum n (fun k => f (m + k)) := m:Natn:Natf:Nat Intisum (m + n) f = isum m f + isum n fun k => f (m + k) induction n with m:Natf:Nat Intisum (m + 0) f = isum m f + isum 0 fun k => f (m + k) All goals completed! 🐙 m:Natf:Nat Intn:Natih:isum (m + n) f = isum m f + isum n fun k => f (m + k)isum (m + (n + 1)) f = isum m f + isum (n + 1) fun k => f (m + k) m:Natf:Nat Intn:Natih:isum (m + n) f = isum m f + isum n fun k => f (m + k)(isum m f + isum n fun k => f (m + k)) + f (m + n) = isum m f + ((isum n fun k => f (m + k)) + f (m + n)); All goals completed! 🐙

The hardware sums a block of 32 products as four trees of eight, then combines the four partial results. The theorem: that regrouping is the identity.

/-- One 8-input hardware tree, rooted at position `base`. -/ def hwSum8At (base : Nat) (f : Nat Int) : Int := isum 8 (fun k => f (base + k)) /-- A 32-element block, the way the silicon adds it. -/ def blockSum (f : Nat Int) : Int := (hwSum8At 0 f + hwSum8At 8 f) + (hwSum8At 16 f + hwSum8At 24 f) /-- The silicon's grouping is exact: four trees of eight, combined, equal the straight sum of thirty-two. -/ theorem blockSum_exact (f : Nat Int) : blockSum f = isum 32 f := f:Nat IntblockSum f = isum 32 f f:Nat Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)blockSum f = isum 32 f f:Nat Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)blockSum f = isum 32 f f:Nat Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)blockSum f = isum 32 f f:Nat Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)h0:(isum 8 fun k => f (0 + k)) = isum 8 fblockSum f = isum 32 f f:Nat Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)h0:(isum 8 fun k => f (0 + k)) = isum 8 f((isum 8 fun k => f (0 + k)) + isum 8 fun k => f (8 + k)) + ((isum 8 fun k => f (16 + k)) + isum 8 fun k => f (24 + k)) = isum 32 f All goals completed! 🐙

5.2. Scales come out of the sum🔗

A block-float dot product multiplies mantissas and fixes up the shared scales once at the end. The justification is that scales distribute out of the whole sum — again exactly, because everything is integers. We state it for arbitrary scale factors s and t (in use they are powers of two, but the theorem does not care):

theorem isum_congr (n : Nat) {f g : Nat Int} (h : k, k < n f k = g k) : isum n f = isum n g := n:Natf:Nat Intg:Nat Inth: (k : Nat), k < n f k = g kisum n f = isum n g induction n with f:Nat Intg:Nat Inth: (k : Nat), k < 0 f k = g kisum 0 f = isum 0 g All goals completed! 🐙 f:Nat Intg:Nat Intn:Natih:(∀ (k : Nat), k < n f k = g k) isum n f = isum n gh: (k : Nat), k < n + 1 f k = g kisum (n + 1) f = isum (n + 1) g f:Nat Intg:Nat Intn:Natih:(∀ (k : Nat), k < n f k = g k) isum n f = isum n gh: (k : Nat), k < n + 1 f k = g kisum n f + f n = isum n g + g n All goals completed! 🐙 theorem isum_mul (c : Int) (n : Nat) (f : Nat Int) : isum n (fun k => c * f k) = c * isum n f := c:Intn:Natf:Nat Int(isum n fun k => c * f k) = c * isum n f induction n with c:Intf:Nat Int(isum 0 fun k => c * f k) = c * isum 0 f All goals completed! 🐙 c:Intf:Nat Intn:Natih:(isum n fun k => c * f k) = c * isum n f(isum (n + 1) fun k => c * f k) = c * isum (n + 1) f All goals completed! 🐙 /-- Pull both operands' scales out of a dot product. -/ theorem scaled_dot (s t : Int) (a b : Nat Int) : isum 32 (fun i => (s * a i) * (t * b i)) = (s * t) * isum 32 (fun i => a i * b i) := s:Intt:Inta:Nat Intb:Nat Int(isum 32 fun i => s * a i * (t * b i)) = s * t * isum 32 fun i => a i * b i s:Intt:Inta:Nat Intb:Nat Inthpt: (k : Nat), k < 32 s * a k * (t * b k) = s * t * (a k * b k)(isum 32 fun i => s * a i * (t * b i)) = s * t * isum 32 fun i => a i * b i All goals completed! 🐙

Together: a block dot product done the silicon's way — integer products, four trees of eight, one scale fix-up — equals the exact mathematical answer, every time, independent of order, wiring, or timing. This is the sentence "Tilessa's math is reproducible" turned into two theorems.

5.3. The mesh earns its keep🔗

The previous chapter's mosaic carried one word as a proof of life. Here is the real daily work: a streamed dot product. One element plays the boundary's role, streaming mantissas eastward; its neighbour holds weights in its scratchpad and multiply-accumulates each arriving value — recv, ldw, mac, four times over — then reads out the accumulator.

/-- Stream 2, 3, 5, 7 to the east, then stop. -/ def streamProg : Program := fun pc => match pc.toNat with | 0 => .li 1 2 | 1 => .send .east 1 | 2 => .li 1 3 | 3 => .send .east 1 | 4 => .li 1 5 | 5 => .send .east 1 | 6 => .li 1 7 | 7 => .send .east 1 | _ => .halt /-- Hold weights 11, 13, 17, 19; MAC each arriving mantissa. -/ def macProg : Program := fun pc => match pc.toNat with | 0 => .li 1 11 | 1 => .stw 1 0 | 2 => .li 1 13 | 3 => .stw 1 1 | 4 => .li 1 17 | 5 => .stw 1 2 | 6 => .li 1 19 | 7 => .stw 1 3 | 8 => .macz | 9 => .recv .west 1 | 10 => .ldw 2 0 | 11 => .mac 1 2 | 12 => .recv .west 1 | 13 => .ldw 2 1 | 14 => .mac 1 2 | 15 => .recv .west 1 | 16 => .ldw 2 2 | 17 => .mac 1 2 | 18 => .recv .west 1 | 19 => .ldw 2 3 | 20 => .mac 1 2 | 21 => .rdacc 3 | _ => .halt def dotProgs : Fin 2 Program | 0 => streamProg | 1 => macProg /-- ⟨2,3,5,7⟩ · ⟨11,13,17,19⟩ = 279, computed by two machines talking through mailboxes, checked on every build. -/ example : let fin := mrunSnap pairWiring dotProgs 64 (MeshState.init (Fin 2) tessera) (fin.elem 1).regs 3 = 279 fin.halted 0 = true fin.halted 1 = true := let fin := mrunSnap pairWiring dotProgs 64 (MeshState.init (Fin 2) tessera); (fin.elem 1).regs 3 = 279 fin.halted 0 = true fin.halted 1 = true All goals completed! 🐙

Two programs, two machines, a stream of words across mailboxes, and an answer that blockSum_exact and scaled_dot say is the only answer possible. When the boundary converters arrive in full — floats in, blocks out — everything below the conversion is already spoken for.

end Tilessa