· Paul Kronlund-Drouault  · 9 min read

Proposition 7

Constrained Generation for Intelligent reasoning Machines

Constrained Generation for Intelligent reasoning Machines

Grammar, Logic and Meaning

Recent developments in the training of Large Language Models (LLMs) have seen the rise of prover models, with deepseek-r1-prover being the pioneer in the domain, and OpenAI quickly following with o3-pro, and even the latest GPT-5-Thinking. These models are highly optimized for logic and proof through Chain-of-Thought (CoT) enabled Reinforcement Learning on hard math problems. While this approach has gathered good results, especially with smaller models developed by companies such as PleIAs or Moonshot AI, it has several fundamental flaws that decrease both accuracy and efficiency.

Under the current training paradigm, LLMs have to internalize three very different notions at the same time:

  • Grammar
  • Logic
  • Meaning/Purpose/Creativity ?

This structure makes the whole pipeline very hard to debug. Failures on benchmarks and evaluations can come from either syntax errors, logical inconsistencies (what Wittgenstein would call unsinnig1), or outright limitations of the models’s “intelligence”.

Some companies, like .txt (dottxt.co) are developing regex-based sampling solutions to fix the grammar aspect, in order to improve tool integration and augment reliability. This has proven successful, and has shown good results with a significant decrease in compute cost in some applications, but it doesnt go far enough.

For one, this approach only covers inference, and doesn’t integrate in training pipeline. And secondly, it it doesnt solve the main problem of logic.

Proposition 7 (P7)

At Unsuspicious Industries, we want to go several steps further in the process. In this post, we are introducing Proposition 7, our constrained generation system [WIP] that unifies typed programming, proof building, reasoning and natural language2.

Philosophy

Philosophy is the practice of asking questions. Mathematics is the branch of philosophy in which answers can be checked for truth or falsity.

Intelligence is best understood as the capacity to ask good questions, in other words, to make interesting choices. “Interesting” is not the same as “true”; it resists definition, though we recognize it when we see it3.

Being good at mathematics is less about the mechanics of proof (we can trivially prove statements, it’s just that most statements are boring) and more about choosing what to prove, and when, to advance a goal.

To build intelligent machines, we should optimize for interesting-ness: a coherent model of what is meaningful versus trivial, coupled with a search that stays within the space of statements that are true.

Humans naturally think with vague language that often outruns strict truth conditions, and it makes sense in a messy world. In formal reasoning, the tradeoff is different. Some of what we find interesting is also absurd: we can call it mysticism4. Mysticism is cool, and we dedicate a lot of time to it, through art, music, literature, and even love, but it’s not something machines can/should do.

In mathematics, our thinking systems must balance interesting-ness with formality: express which questions should be asked while respecting which questions can be asked.

With constrained generation our aim is simple: combine the broad priors of language models, as rough approximations of what is interesting, with formal frameworks that admit only true paths, so we can optimize directly for our real target: intelligence, expressed through taste5.

In such a setup, “hallucination” loses its meaning: truth is enforced by the formal layer, while open-ended association remains contained in latent representations, or alternative channels. By itself this is not enough, and freely pretrained models may stall, so Proposition 7 combines it with a training pipeline that applies a custom loss to formally verified outputs to shape intelligence, and an entropy-based sampling system.

For me, mathematics is less about method than about envisioning possibilities and choosing the ones that matter. It is creativity before than computation, and Proposition 7 is a way for machines to develop that creativity, and build intuition without the noise of unconstrained natural language.

System Overview

  • Constraint Engine: unified grammar + typing/proof rules that produce only valid moves/ASTs within a formal system.
  • Syntax independence: the formally constrained sampler is made to be synatx independent and work over runtime-specified context grammars with attached typing rules.
  • Constrained pretrained LLM: ranks valid moves ( translated to tokens) using learnt criteria of intelligence: elegance, generality, reuse, surprise.
  • Mode-switching sampler: a natural-language “reasoning” mode for explanations, creative flow, and mystic inspiration and a formal reasoning mode for structured proof generation and logic reasoning.

Constraint Engine (System-Agnostic)

The engine and the core calculus are the same surface: define what well‑formed looks like and only valid steps exist. A rule simply licenses a next move and anything unlicensed is unrepresentable.

Sketch (rule shape)

Production:
  NonTerminal(rule) ::= parts[b] ...
Typing:
  premise_1, ..., premise_k
  ------------------------ (rule)
  judgment

Premise forms (closed):
  Γ[...] ⊢ e : τ         # typing
  x ∈ Γ                 # context membership
  τ₁ = τ₂               # convertibility / equality
  predicate(args)       # side condition (fresh, occurs_check, ...)

Minimal ingredients: a binding pattern, one inference line, premises from the four shapes, and an equality notion (plus optional fast predicates).

Meta‑typing (how we stay system‑agnostic):

  • Two levels: object level (terms / proofs) and meta level (the rule file). The rule file itself is parsed and checked before being admitted; rules are data.
  • Each rule undergoes a meta pass:
    1. Binding hygiene / scope sanity,
    2. Premise shape validation (only four primitive premise forms),
    3. Well‑scoped metas resolved or flagged,
    4. Side predicates whitelisted & sandboxed.
  • Adding features can be done by supplying new constructor symbols alongside rules that relate them. If the meta checker accepts the file, the engine can generate with them without code changes.
  • Reflection boundary: the system never executes object rules at meta time, it simulates them structurally to guarantee they cannot smuggle imperative effects (soundness guard). It’s some approximation of abstract interpretation.

The current prototype already exercises this: intersection|union|negation|refinements|universes sit in one rule file; removing them is deleting lines, not recompiling logic. This way, sampling constraints can be dynamic, and most adapted to the problem you’re working with. This means we can easily switch, from contraining in one language to another, dynamically.

Generation loop:

  1. unify with rule conclusions (up to declared conversion)
  2. silently drop rules whose instantiated premises fail
  3. emit admissible refinements (rule id + bindings). Invalid states never materialize.

The mysticism Paradox

  • Constraints create freedom: removing correctness burden frees the model to explore style, and allows for more creative outputs, with mechanisms like temperature, or entropy sampling.
  • Form enables expression: like meter in poetry, rules focus creativity6.
  • Interesting-ness resists definition but emerges from examples and use.

Architecture

  • Constraint Engine
    • Grammar builder: typed AST only.
    • Type/proof checker: real-time premise satisfaction, unification.
    • Proof verifier: derivation attached to each node.
  • Language Core
    • parameterized core calculus (rule-file specified).
  • Taste Learner
    • Inputs: Mixed formal/informal context.
    • Outputs: tokens interpreted as value over moves; sampling balances exploration vs. elegance.
  • Modal sampling policy
    • Constrained formal path for logic reasoning and output
    • Free-language sampling for explanations, more open (vague) reasoning, creativity, and mysticism.
    • The model can still switch between modes of reasoning, based on variables such as entropy. This might also help with dead-ends in search space.

Open Questions

  • Exploration vs exploitation: scheduling diversity when all branches are already valid.
  • Expressiveness vs decidability: managing search with intersections, unions, negation, refinements before adding heavier features (inductives, quantifiers).
  • Refinement predicates: trusted boundary or verified? sandboxing / SMT side-check vs string predicate.
  • Normalization / equality: how far to push conversion (β, δ, η, simplifications) without exploding unification cost.
  • Entropy-driven mode switching: optimal triggers that surface genuine abstraction needs rather than noise.
  • Encoding logic symbols in tokens: Should our system use language-like tokens for logical symbols (e.g., “forall”, “exists”) or should it use a more compact representation? How does this affect the model’s ability to learn and generalize? Maybe special tokens?
  • Beyond math: Could it be possible to extend the system to other domains, with different context-bound constraints? How would we adapt the grammar and typing rules to handle different types of reasoning, such as legal reasoning or scientific hypothesis testing?

These questions represent the core research challenges that would need to be addressed in developing P7 from proposition to working system.

Practical Takeaways

  • Practitioners: Use constraints to eliminate invalid states; devote model capacity to style and strategy.
  • Decision-makers: Expect better efficiency (less wasted compute) and higher perceived quality (elegance, reuse) when correctness is guaranteed by construction.
  • Researchers: Treat interesting-ness as a learnable value function over valid moves; compare signals (structural, graph, human preference).

References

Philosophy & Foundations of Mathematics

  • Wittgenstein, L. (1921). Tractatus Logico-Philosophicus.
  • Wittgenstein, L. (1953). Philosophical Investigations.
  • Hardy, G. H. (1940). A Mathematician’s Apology.
  • Poincaré, H. (1905). Science and Hypothesis.
  • Kant, I. (1790). Critique of Judgment.

Core Type Theory & Foundations

  • Martin-Löf, P. (1984). Intuitionistic Type Theory.
  • Girard, J.-Y. (1972). Interprétation fonctionnelle… (System F / cut elimination).
  • Reynolds, J. C. (1974). Towards a theory of type structure. (Parametric polymorphism).
  • Barendregt, H. (1992). Lambda calculi with types.
  • Wadler, P. (2015). Propositions as Types.

Constrained Generation, Types & Synthesis

  • Muendler, N., Antonov, N., & Vechev, M. (2025). Type-aware constraint generation for LLM code synthesis (ETH Zurich).
  • Willard, B., & Louf, R. (2023). Efficient guided generation for large language models.
  • Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting is Programming.
  • Polikarpova, Kuraj, & Solar-Lezama (2016). Refinement type synthesis.
  • Knuth, D. E. (1968). Semantics of context-free languages.
  • Pierce, B. C., & Turner, D. N. (2000). Local type inference.

Notes

Footnotes

  1. In Wittgenstein’s Tractatus Logico-Philosophicus, unsinnig (nonsensical) refers to propositions that violate logical grammar and are therefore meaningless, distinct from sinnlos (senseless) which refers to tautologies and contradictions that are logically well-formed but say nothing about the world. See propositions 4.461-4.4611.

  2. The name references Wittgenstein’s final proposition in the Tractus: “Whereof one cannot speak, thereof one must be silent” (Wovon man nicht sprechen kann, darüber muß man schweigen). Our system aims to clearly delineate what can be formally expressed versus what belongs to the realm of the unsayable (mystical).

  3. This echoes the aesthetic theory of mathematical beauty found in Hardy’s A Mathematician’s Apology and Poincaré’s writings on mathematical intuition, where elegance and surprise are recognized as fundamental but indefinable qualities.

  4. Here we use “mysticism” in the sense of Wittgenstein’s later work, particularly the Philosophical Investigations, where he discusses the limits of what can be meaningfully said versus what can only be shown or experienced.

  5. The notion of “taste” as a form of intelligence draws from Kant’s Critique of Judgment, where aesthetic judgment is seen as a bridge between understanding and reason, involving both subjective experience and claims to universal validity.

  6. This paradox echoes the concept of “creative constraints” found in artistic traditions, like the formal restrictions of sonnets or George Perec’s La disparition (a book he wrote without using the letter “e”), where limitations paradoxically enable rather than restrict creative expression.

Back to Blog

Related Posts

View All Posts »