johnpfeiffer
  • Home
  • Engineering (People) Managers
  • John Likes
  • Software Engineer Favorites
  • Categories
  • Tags
  • Archives

AI opportunities need improved Spec Driven Development with TLA+

Contents

  • Ambiguity in the Era of AI
  • Spec Driven Design
    • SDD Gaps
  • Human Intent Matters
    • Imperative vs Declarative
  • An evening with TLA+
    • Learning to harness the invariant approach
      • Multiple Fixes - governance
    • Learning about Weakness
  • Summary
  • Appendix - Learning TLA+
    • Background on TLA+
    • Writing a simple spec
      • Breaking Bad
      • Checking the spec - enforcing invariants
      • The fix - ask an LLM

Ambiguity in the Era of AI

"Record who fed her cat food"

Is it a human eating cat food, or is it a human feeding a cat? Record with video, or write it down?

English is infamously imprecise - which is a challenge when writing software requirements.

Throughout my software career, I've lived through so many telephone games:

Customer/User -> Support -> Product Manager -> Engineer

LLMs are not a silver bullet, they amplify existing systems.

Ambiguity and imprecision become an LLM over-engineered ProductRequirementDocument (so many paragraphs of such long sentences, absolutely!), followed by AI Agents that auto-generate an exponential number of (obfuscated, contradictory) JIRA tickets.

A waste of time, tokens (dollars!) - and worse, finite human attention.

Using an electric car, to pull a combustion engine, that is pulling a wooden cart... with a horse sitting in the cart.

Spec Driven Design

An increasingly popular approach to constrain and leverage LLMs in code generation is to be more precise in specifying what we want: https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

This practical application has strong foundations in computer science theory:

Shannon's "information theory" and Hamming codes provide precise terminology and clear thought allowing mathemtical quantification of signal-to-noise ratios and error rates in communication.

Shannon Weaver Model of Communication

SDD Gaps

Spec-Driven Design does not have a mechanism to distill english into unambiguous precision, nor does it have a method of enforcing behavior/implementation.

Practically: copy pasting the each absurdly wrong auto-generated jira ticket (ok - your Agent has MCP so it ingests them automatically) will never be "correct", it will not deliver value, and worst of all...

Building the wrong thing quickly will waste everyone's time and patience.

Cue super hero: Leslie Lamport (Turing Award Winner and creator of TLA+)

One reason for modeling a system is to check if it does what we want it to

What if you time-traveled decades before LLMs and created a precise language that provides deterministic strong guarantees about software?

Human Intent Matters

Now that (with LLMs and coding agents) writing and changing code is easy, let's tackle the really hard part: writing specifications.

Extracting the correct invariants takes deep focus - but it is truly the unique human role to provide intent.

Here's what invariants look like for a real project — a "software professional personality" quiz game I'm building:

INV-001: There is at least 1 personality type
INV-002: Every personality type has at least 1 source link
INV-003: Each question has at least 1 answer
INV-004: Each question references a source
INV-005: Every question modifies at least one score
INV-006: Each personality type must be reachable

These can be verified across all possible states, and the last one is critical: ensuring there is a path of answers that can lead to every personality type.

Imperative vs Declarative

Infrastructure as Code (IaaC) already went through a similar process of shifting from a series of How steps to simply stating "it is this".

Imperative: "Start a server, run updates, install the firewall, install the nginx webserver, copy configuration..."

Declarative: "A server exists with a single network port exposed, running a webserver configured via nginx.conf"

Applying this declarative mindset to an improved Specification workflow:

  • Humans state the invariants
  • Software (LLMs) translate english invariants into a formal specification
  • A deterministic model checker verifies the specification for logical correctness
  • then leverage the best of Spec-Driven Development to write an implementation and execution plan
  • Agentic coding (with harness and skills) implement
  • [Verify via automated tests and auditing log statements]

An evening with TLA+

I've wanted to learn TLA+ for years but got stuck on academic formalism, the dense mathematical syntax, and the complex toolchain.

This time I paired with an LLM: learning and building, up and running in an hour.

I gave Claude my six English invariants and asked it to generate a TLA+ specification, absolutely!

Codex parsed the spec and even ran the model checker for me. Success!

Learning to harness the invariant approach

AI Over-Engineering happened: the eager LLM generated a spec with sixteen constants including Bayesian posterior probabilities, expected information gain calculations, and an adaptive question engine. Features I hadn't really specified yet.

It's a QA best practice, especially with new tools/techniques, to deliberately break something and see the expected failure. This helps avoid vacuous or tautology tests - things that do nothing but "pass/succeed".

(Thanks to an LLM suggestion of exactly where/what to change)... I changed the test data so a question contributed zero points to any score, INV-005 says every question must modify at least one score, so this should have failed.

Finished computing initial states: 0 distinct states generated
Model checking completed. No error has been found.
✔ Success

Zero states generated - no error has been found - subtle and silent failures are gotchas when working with LLMs.

Multiple Fixes - governance

I added three rules to my AGENTS.md to strengthen governance and enforce simplicity:

1. Every predicate in a TLA+ spec traces to a numbered INV-xxx in INVARIANTS.md. No extra predicates.
2. The model file contains only literal test data.
   If a value depends on other constants, define it as an operator in the spec, not a constant in the model.
3. A TLC run that generates 0 states is a failure. Report it as an error and diagnose.

The AI re-read the rules, looked at its own output, and flagged itself:

"The TLA+ model file is not literal-only. CReachableOutcomes == CPersonalities is a derived value, not literal model data."

After the LLM cleaned up and regenerated the TLA files, the model shrank to pure literals.

Then to deliberately break it I zeroed out all scores for one question and re-ran:

uv run tla tlc RandsPersonalityGameInvariantCheck.tla

The invariant of INV005 is equal to FALSE

So I successfully forced an error - and revealed a design bug.

Learning about Weakness

INV-005: Every question modifies at least one score

Thinking through what's wrong with the logical model I read the "test cases" and realized that if the first question's score was very high for a personality, then following questions that only changed the score a tiny bit would never alter the final outcome.

The user would (unwittingly) answer a lot of questions for no appreciable difference.

So I had to improve my spec

INV-005: Every question can change the outcome
For each question, there exist answer choices that produce different final personality types.

Summary

TLA+ is a powerful tool that gives you more predictability and certainty, before you unleash the chaos of LLMs.

LLMs "raise the floor" so anyone can use invariants to build reliable systems (instead of leaping headfirst into vibe coding an unmaintainable buggy mess). The system should leverage information theory, not be a victim of the noise.

And LLMs are not only translators, they are tutors to help you improve on the unique human role of specifying the Intent.

Of course, also "trust but verify": Nothing tested so no errors found. Reward hacking. Over-engineering. This all means the human is ultimately the judge - less time spent on the How and more on "is this the right What"?

Appendix - Learning TLA+

Background on TLA+

"Temporal Logic of Actions", TLA+ is a (formal specification ) language for modeling software, based on temporal logic and set theory.

  • https://lamport.azurewebsites.net/tla/high-level-view.html
  • https://www.wired.com/2013/01/code-bugs-programming-why-we-need-specs/

Basically: Initial states -> Allowed transitions -> Properties to verify

It lets you describe:

  • States of a system
  • Transitions between states
  • Safety properties ("nothing bad happens")
  • Liveness properties ("something good eventually happens")

Tools can then check the specification and mathematically verify its properties.

TLA+ is used in industry by AWS, Microsoft, and other places requiring extreme reliability.

It’s especially popular for consensus algorithms, distributed protocols, locking systems, etc.

  • https://lamport.azurewebsites.net/tla/industrial-use.html
  • https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf
  • https://cacm.acm.org/research/how-amazon-web-services-uses-formal-methods/
  • https://conf.tlapl.us/2019/yannickwelsch/ (Elasticsearch)

Writing a simple spec

What if you wanted to ensure a bank account never went below 0...

---- MODULE BankAccount ----
EXTENDS Naturals
VARIABLES balance

Init == balance = 0
Next == UNCHANGED balance
INV001 == balance >= 0
====
  • "Naturals" is the library for numbers
  • Initially balance is 0
  • Next state is to do nothing
  • The invariant is the balance has to be non-negative

Breaking Bad

Hey, a feature request: Decrement the bank account...

---- MODULE BankAccount ----
EXTENDS Naturals
VARIABLES balance

Init == balance = 0

Decrement == balance' = balance - 1
Next == Decrement

INV001 == balance >= 0
====

Decrement describes that balance' has the new state.

And when Next occurs, we decrement: is the Invariant still true?

Checking the spec - enforcing invariants

Please excuse the yak shaving...

https://seths.blog/2005/03/dont_shave_that/

Tooling: TLC is a Model checker

brew install openjdk


java -version

uv init --bare
uv add --dev tlaplus-cli
uv run tla --version
uv run tla tools install

export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"
uv run tla check-java

So after all that...

BankAccount.cfg

INIT Init
NEXT Next
INVARIANT INV001

uv run tla tlc BankAccount.tla

Running TLC (TLC2 Version 2.19 of 08 August 2024 (rev: 5a47802)) on BankAccount.tla ...
Starting SANY...
Parsing file /Users/.../BankAccount.tla
Semantic processing of module Naturals
Semantic processing of module BankAccount
SANY finished.
Finished computing initial states: 1 distinct state generated
Invariant INV001 is violated.


Finished in 245ms
✖ Error: Model checking failed with 1 error(s).

The human has to think logically about happened

JAVA_HOME=/opt/homebrew/opt/openjdk uv run tla check-java (~/.zprofile or ~/.bashrc) export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"

The fix - ask an LLM

Paste the spec and the TLC error into your favorite LLM and ask for the simplest fix.

---- MODULE BankAccount ----
EXTENDS Naturals
VARIABLES balance

Init == balance = 0


Decrement == balance > 0 /\ balance' = balance - 1
Next == Decrement \/ UNCHANGED balance

INV001 == balance >= 0
====
  • A logical AND is represented with /\ so the balance has to be greater than 0 AND then subtract
  • A logical OR is represented with \/ so the Next step will either decrement or not change the balance

Therefore we have fixed our model to not decrement when the balance is 0.


  • « AI Natives do not question the gifts from the gods

Published

May 2, 2026

Category

ai

~1591 words

Tags

  • ai 10
  • llm 4
  • qa 2
  • sdd 1
  • tla 1