From Natural Language to Logic Machines Through Formal Constraints.
Recent developments in the training of Large Language Models (LLMs) have seen the rise of prover models, with deepseek-r1-prover being the pioneer in the domain, and OpenAI quickly following with o3-pro, and even the latest GPT-5-Thinking. These models are highly optimized for logic and proof through Chain-of-Thought (CoT) enabled Reinforcement Learning on hard math problems. While this approach has gathered good results, especially with smaller models developed by companies such as PleIAs or Moonshot AI, it has several fundamental flaws that decrease both accuracy and efficiency.
Under the current training paradigm, LLMs have to internalize three very different notions at the same time:
This structure makes the whole pipeline very hard to debug. Failures on benchmarks and evaluations can come from either syntax errors, logical inconsistencies (what Wittgenstein would call unsinnig1), or outright limitations of the models’s “intelligence”.
Some companies, like .txt (dottxt.co) are developing regex-based sampling solutions to fix the grammar aspect, in order to improve tool integration and augment reliability. This has proven successful, and has shown good results with a significant decrease in compute cost in some applications, but it doesnt go far enough.
For one, this approach only covers inference, and doesn’t integrate in training pipeline. And secondly, it it doesnt solve the main problem of logic.
At Unsuspicious Industries, we want to go several steps further in the process. In this post, we are introducing Proposition 7, our constrained generation system [WIP] that unifies typed programming, proof building, reasoning and natural language2.
Philosophy is the practice of asking questions. Mathematics is the branch of philosophy in which answers can be checked for truth or falsity.
Intelligence is best understood as the capacity to ask good questions, in other words, to make interesting choices. “Interesting” is not the same as “true”; it resists definition, though we recognize it when we see it3.
Being good at mathematics is less about the mechanics of proof (we can trivially prove statements, it’s just that most statements are boring) and more about choosing what to prove, and when, to advance a goal.
To build intelligent machines, we should optimize for interesting-ness: a coherent model of what is meaningful versus trivial, coupled with a search that stays within the space of statements that are true.
Humans naturally think with vague language that often outruns strict truth conditions; that makes sense in a messy world. In formal reasoning, the tradeoff is different. Some of what we find interesting is also absurd: we call it mysticism4. Mysticism is cool, and we dedicate a lot of time to it, through art, music, literature, and even love, but it’s not something machines can/should do.
In mathematics, our thinking systems must balance interesting-ness with formality: express which questions ought to be asked while respecting which questions can be asked. This is the role of constraint generation.
Our aim is simple: combine the broad priors of language models, as rough approximations of what is interesting, with formal frameworks that admit only true paths, so we can optimize directly for our real target: intelligence, expressed through taste5.
In such a setup, “hallucination” loses its meaning: truth is enforced by the formal layer, while open-ended association remains contained in latent representations, or alternative channels. By itself this is not enough, and freely pretrained models may stall, so Proposition 7 combines it with a training pipeline that applies a custom loss to formally verified outputs to shape intelligence.
In short, mathematics is less about method than about envisioning possibilities and choosing the ones that matter. It is creativity more than computation, and Proposition 7 is a way for machines to cultivate that creativity, build intuition without the noise of unconstrained natural language.
The engine and the core calculus are the same surface: define what well‑formed looks like and only valid steps exist. Grammar + typing = one contract. A rule simply licenses a next move and anything unlicensed is unrepresentable.
Sketch (rule shape)
Production:
NonTerminal(rule) ::= parts[b] ...
Typing:
premise_1, ..., premise_k
------------------------ (rule)
judgment
Premise forms (closed):
Γ[...] ⊢ e : τ # typing
x ∈ Γ # context membership
τ₁ = τ₂ # convertibility / equality
predicate(args) # side condition (fresh, occurs_check, ...)
Minimal ingredients: a binding pattern, one inference line, premises from the four shapes, and an equality notion (plus optional fast predicates).
Meta‑typing (how we stay system‑agnostic):
The current prototype already exercises this: intersection / union / negation / refinements / universes sit in one rule file; removing them is deleting lines, not recompiling logic. That illustrates the contract: expressiveness lives in data.
Generation loop:
Meta benefit (why it matters):
Translation is done in a multi-step routine: external proof/program → neutral IR → elaborated into active rules
One‑line philosophy:
Using an instantiated core (e.g., a dependent fragment when needed), we can translate structured mathematical proofs and selected formalizable argumentative structures into derivations guaranteed correct by construction. This extends to typed program derivations, enabling generation of formally verified code and mitigating security / robustness issues in unconstrained LLM synthesis.
All outputs are valid; learning only ranks among valid choices. This is the key to Proposition 7: we can focus on what is interesting, not what is true.
Signals (non-exclusive)
Model sketch
Training phases
Difficulties
Unification-guided rule selection:
Preference learning over valid trajectories:
Algorithm sketch (taste-guided constrained generation)
Given:
Algorithm P7-TrainStep:
Invariant (correctness by construction):
Reward decomposition (example):
These questions represent the core research challenges that would need to be addressed in developing P7 from proposition to working system.
In Wittgenstein’s Tractatus Logico-Philosophicus, unsinnig (nonsensical) refers to propositions that violate logical grammar and are therefore meaningless, distinct from sinnlos (senseless) which refers to tautologies and contradictions that are logically well-formed but say nothing about the world. See propositions 4.461-4.4611. ↩
The name references Wittgenstein’s final proposition in the Tractus: “Whereof one cannot speak, thereof one must be silent” (Wovon man nicht sprechen kann, darüber muß man schweigen). Our system aims to clearly delineate what can be formally expressed versus what belongs to the realm of the unsayable (mystical). ↩
This echoes the aesthetic theory of mathematical beauty found in Hardy’s A Mathematician’s Apology and Poincaré’s writings on mathematical intuition, where elegance and surprise are recognized as fundamental but indefinable qualities. ↩
Here we use “mysticism” in the sense of Wittgenstein’s later work, particularly the Philosophical Investigations, where he discusses the limits of what can be meaningfully said versus what can only be shown or experienced. ↩
The notion of “taste” as a form of intelligence draws from Kant’s Critique of Judgment, where aesthetic judgment is seen as a bridge between understanding and reason, involving both subjective experience and claims to universal validity. ↩
This paradox echoes the concept of “creative constraints” found in artistic traditions, like the formal restrictions of sonnets or George Perec’s La disparition (a book he wrote without using the letter “e”), where limitations paradoxically enable rather than restrict creative expression. ↩
Dead-ends in formal reasoning exemplify Wittgenstein’s sinnlos: they are logically well-formed and syntactically valid, but contribute nothing meaningful toward the proof goal. Unlike unsinnig statements that violate logical grammar, sinnlos paths waste computational resources on formally correct but purposeless exploration. By construction (eliminating the unsinnig), P7 can devote all learning capacity to the harder problem of recognizing and avoiding the sinnlos. ↩