5. The Smart Edge🔗

The specification chapter promised that all the difficult floating-point machinery lives at the mesh's edge, and that inside the mesh, arithmetic is plain integers. This chapter starts making the edge honest. The full story of the boundary — converting standard floats into blocks and back, with rounding — needs a bit-level model of the float format and comes later. What we can nail down now is the part with no rounding in it at all: once numbers are mantissas on a shared scale, the hardware's way of adding them — eight at a time, in a tree, four trees to a block — computes exactly the sum a mathematician would write. Not approximately: exactly, for every input.

That claim sounds too obvious to prove until you remember what it is really saying: reordering and regrouping additions is harmless. For integers it is; for floating point it famously is not (float addition is not associative, which is why summing the same numbers in a different order gives a different answer — and why reproducibility is so hard for conventional accelerators). The mosaic's determinism story leans on integer arithmetic precisely here, so here is where the theorem belongs.

5.1. Sums, the way hardware takes them🔗

namespace Tilessa

/-- The sum f 0 + f 1 + ⋯ + f (n-1), taken in order. -/
def isum : Nat → (Nat → Int) → Int
  | 0,     _ => 0
  | n + 1, f => isum n f + f n

/-- Splitting a sum at any point changes nothing. -/
theorem isum_shift (m n : Nat) (f : Nat → Int) :
    isum (m + n) f
      = isum m f + isum n (fun k => f (m + k)) := bym:Natn:Natf:Nat → Int⊢ isum (m + n) f = isum m f + isum n fun k => f (m + k)
  induction n with
  | zero =>zerom:Natf:Nat → Int⊢ isum (m + 0) f = isum m f + isum 0 fun k => f (m + k) simp [isum]All goals completed! 🐙
  | succ n ih =>succm:Natf:Nat → Intn:Natih:isum (m + n) f = isum m f + isum n fun k => f (m + k)⊢ isum (m + (n + 1)) f = isum m f + isum (n + 1) fun k => f (m + k) simp [isum, ih]succm:Natf:Nat → Intn:Natih:isum (m + n) f = isum m f + isum n fun k => f (m + k)⊢ (isum m f + isum n fun k => f (m + k)) + f (m + n) = isum m f + ((isum n fun k => f (m + k)) + f (m + n)); omegaAll goals completed! 🐙

The hardware sums a block of 32 products as four trees of eight, then combines the four partial results. The theorem: that regrouping is the identity.

/-- One 8-input hardware tree, rooted at position `base`. -/
def hwSum8At (base : Nat) (f : Nat → Int) : Int :=
  isum 8 (fun k => f (base + k))

/-- A 32-element block, the way the silicon adds it. -/
def blockSum (f : Nat → Int) : Int :=
  (hwSum8At 0 f + hwSum8At 8 f)
    + (hwSum8At 16 f + hwSum8At 24 f)

/-- The silicon's grouping is exact: four trees of eight, combined,
equal the straight sum of thirty-two. -/
theorem blockSum_exact (f : Nat → Int) :
    blockSum f = isum 32 f := byf:Nat → Int⊢ blockSum f = isum 32 f
  have h8 : isum 16 f
      = isum 8 f + isum 8 (fun k => f (8 + k)) :=
    isum_shift 8 8 ff:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)⊢ blockSum f = isum 32 f
  have h16 : isum 24 f
      = isum 16 f + isum 8 (fun k => f (16 + k)) :=
    isum_shift 16 8 ff:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)⊢ blockSum f = isum 32 f
  have h24 : isum 32 f
      = isum 24 f + isum 8 (fun k => f (24 + k)) :=
    isum_shift 24 8 ff:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)⊢ blockSum f = isum 32 f
  have h0 : isum 8 (fun k => f (0 + k)) = isum 8 f :=
    congrArg (isum 8) (funext fun k => byf:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)k:Nat⊢ f (0 + k) = f k rw [Nat.zero_addf:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)k:Nat⊢ f k = f k]All goals completed! 🐙)f:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)h0:(isum 8 fun k => f (0 + k)) = isum 8 f⊢ blockSum f = isum 32 f
  simp only [blockSum, hwSum8At]f:Nat → Inth8:isum 16 f = isum 8 f + isum 8 fun k => f (8 + k)h16:isum 24 f = isum 16 f + isum 8 fun k => f (16 + k)h24:isum 32 f = isum 24 f + isum 8 fun k => f (24 + k)h0:(isum 8 fun k => f (0 + k)) = isum 8 f⊢ ((isum 8 fun k => f (0 + k)) + isum 8 fun k => f (8 + k)) +
    ((isum 8 fun k => f (16 + k)) + isum 8 fun k => f (24 + k)) =
  isum 32 f
  omegaAll goals completed! 🐙

5.2. Scales come out of the sum🔗

A block-float dot product multiplies mantissas and fixes up the shared scales once at the end. The justification is that scales distribute out of the whole sum — again exactly, because everything is integers. We state it for arbitrary scale factors s and t (in use they are powers of two, but the theorem does not care):

theorem isum_congr (n : Nat) {f g : Nat → Int}
    (h : ∀ k, k < n → f k = g k) : isum n f = isum n g := byn:Natf:Nat → Intg:Nat → Inth:∀ (k : Nat), k < n → f k = g k⊢ isum n f = isum n g
  induction n with
  | zero =>zerof:Nat → Intg:Nat → Inth:∀ (k : Nat), k < 0 → f k = g k⊢ isum 0 f = isum 0 g rflAll goals completed! 🐙
  | succ n ih =>succf:Nat → Intg:Nat → Intn:Natih:(∀ (k : Nat), k < n → f k = g k) → isum n f = isum n gh:∀ (k : Nat), k < n + 1 → f k = g k⊢ isum (n + 1) f = isum (n + 1) g
    simp only [isum]succf:Nat → Intg:Nat → Intn:Natih:(∀ (k : Nat), k < n → f k = g k) → isum n f = isum n gh:∀ (k : Nat), k < n + 1 → f k = g k⊢ isum n f + f n = isum n g + g n
    rw [ih (fun k hk => h k (Nat.lt_succ_of_lt hk)),succf:Nat → Intg:Nat → Intn:Natih:(∀ (k : Nat), k < n → f k = g k) → isum n f = isum n gh:∀ (k : Nat), k < n + 1 → f k = g k⊢ isum n g + f n = isum n g + g n
        h n (Nat.lt_succ_self n)succf:Nat → Intg:Nat → Intn:Natih:(∀ (k : Nat), k < n → f k = g k) → isum n f = isum n gh:∀ (k : Nat), k < n + 1 → f k = g k⊢ isum n g + g n = isum n g + g n]All goals completed! 🐙

theorem isum_mul (c : Int) (n : Nat) (f : Nat → Int) :
    isum n (fun k => c * f k) = c * isum n f := byc:Intn:Natf:Nat → Int⊢ (isum n fun k => c * f k) = c * isum n f
  induction n with
  | zero =>zeroc:Intf:Nat → Int⊢ (isum 0 fun k => c * f k) = c * isum 0 f simp [isum]All goals completed! 🐙
  | succ n ih =>succc:Intf:Nat → Intn:Natih:(isum n fun k => c * f k) = c * isum n f⊢ (isum (n + 1) fun k => c * f k) = c * isum (n + 1) f simp [isum, ih, Int.mul_add]All goals completed! 🐙

/-- Pull both operands' scales out of a dot product. -/
theorem scaled_dot (s t : Int) (a b : Nat → Int) :
    isum 32 (fun i => (s * a i) * (t * b i))
      = (s * t) * isum 32 (fun i => a i * b i) := bys:Intt:Inta:Nat → Intb:Nat → Int⊢ (isum 32 fun i => s * a i * (t * b i)) = s * t * isum 32 fun i => a i * b i
  have hpt : ∀ k, k < 32 →
      (s * a k) * (t * b k) = (s * t) * (a k * b k) := by
    intro k _s:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * a k * (t * b k) = s * t * (a k * b k)
    calc (s * a k) * (t * b k)
        = s * (a k * (t * b k)) := bys:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * a k * (t * b k) = s * (a k * (t * b k)) rw [Int.mul_assocs:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (a k * (t * b k)) = s * (a k * (t * b k))]All goals completed! 🐙
      _ = s * ((a k * t) * b k) := bys:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (a k * (t * b k)) = s * (a k * t * b k) rw [Int.mul_assocs:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (a k * (t * b k)) = s * (a k * (t * b k))]All goals completed! 🐙
      _ = s * ((t * a k) * b k) := bys:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (a k * t * b k) = s * (t * a k * b k) rw [Int.mul_comm (a k) ts:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (t * a k * b k) = s * (t * a k * b k)]All goals completed! 🐙
      _ = s * (t * (a k * b k)) := bys:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (t * a k * b k) = s * (t * (a k * b k)) rw [Int.mul_assocs:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (t * (a k * b k)) = s * (t * (a k * b k))]All goals completed! 🐙
      _ = (s * t) * (a k * b k) := bys:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (t * (a k * b k)) = s * t * (a k * b k) rw [Int.mul_assocs:Intt:Inta:Nat → Intb:Nat → Intk:Nata✝:k < 32⊢ s * (t * (a k * b k)) = s * (t * (a k * b k))]s:Intt:Inta:Nat → Intb:Nat → Inthpt:∀ (k : Nat), k < 32 → s * a k * (t * b k) = s * t * (a k * b k)⊢ (isum 32 fun i => s * a i * (t * b i)) = s * t * isum 32 fun i => a i * b i
  rw [isum_congr 32 hpt,s:Intt:Inta:Nat → Intb:Nat → Inthpt:∀ (k : Nat), k < 32 → s * a k * (t * b k) = s * t * (a k * b k)⊢ (isum 32 fun k => s * t * (a k * b k)) = s * t * isum 32 fun i => a i * b i isum_muls:Intt:Inta:Nat → Intb:Nat → Inthpt:∀ (k : Nat), k < 32 → s * a k * (t * b k) = s * t * (a k * b k)⊢ (s * t * isum 32 fun k => a k * b k) = s * t * isum 32 fun i => a i * b i]All goals completed! 🐙

Together: a block dot product done the silicon's way — integer products, four trees of eight, one scale fix-up — equals the exact mathematical answer, every time, independent of order, wiring, or timing. This is the sentence "Tilessa's math is reproducible" turned into two theorems.

5.3. The mesh earns its keep🔗

The previous chapter's mosaic carried one word as a proof of life. Here is the real daily work: a streamed dot product. One element plays the boundary's role, streaming mantissas eastward; its neighbour holds weights in its scratchpad and multiply-accumulates each arriving value — recv, ldw, mac, four times over — then reads out the accumulator.

/-- Stream 2, 3, 5, 7 to the east, then stop. -/
def streamProg : Program := fun pc =>
  match pc.toNat with
  | 0 => .li 1 2  | 1 => .send .east 1
  | 2 => .li 1 3  | 3 => .send .east 1
  | 4 => .li 1 5  | 5 => .send .east 1
  | 6 => .li 1 7  | 7 => .send .east 1
  | _ => .halt

/-- Hold weights 11, 13, 17, 19; MAC each arriving mantissa. -/
def macProg : Program := fun pc =>
  match pc.toNat with
  | 0 => .li 1 11 | 1 => .stw 1 0
  | 2 => .li 1 13 | 3 => .stw 1 1
  | 4 => .li 1 17 | 5 => .stw 1 2
  | 6 => .li 1 19 | 7 => .stw 1 3
  | 8 => .macz
  | 9  => .recv .west 1 | 10 => .ldw 2 0 | 11 => .mac 1 2
  | 12 => .recv .west 1 | 13 => .ldw 2 1 | 14 => .mac 1 2
  | 15 => .recv .west 1 | 16 => .ldw 2 2 | 17 => .mac 1 2
  | 18 => .recv .west 1 | 19 => .ldw 2 3 | 20 => .mac 1 2
  | 21 => .rdacc 3
  | _ => .halt

def dotProgs : Fin 2 → Program
  | 0 => streamProg
  | 1 => macProg

/-- ⟨2,3,5,7⟩ · ⟨11,13,17,19⟩ = 279, computed by two machines
talking through mailboxes, checked on every build. -/
example :
    let fin := mrunSnap pairWiring dotProgs 64
      (MeshState.init (Fin 2) tessera)
    (fin.elem 1).regs 3 = 279
      ∧ fin.halted 0 = true ∧ fin.halted 1 = true := by⊢ let fin := mrunSnap pairWiring dotProgs 64 (MeshState.init (Fin 2) tessera);
(fin.elem 1).regs 3 = 279 ∧ fin.halted 0 = true ∧ fin.halted 1 = true
  native_decideAll goals completed! 🐙

Two programs, two machines, a stream of words across mailboxes, and an answer that blockSum_exact and scaled_dot say is the only answer possible. When the boundary converters arrive in full — floats in, blocks out — everything below the conversion is already spoken for.

end Tilessa