Vitalik: What is the key to the next phase of Ethereum?

By: rootdata|2026/05/20 11:10:28

ETH

drake

lean-protocol

Author: Vitalik Buterin

Translated by: Jiahua, ChainCatcher

Special thanks to Yoichi Hirai, Justin Drake, Nadim Kobeissi, and Alex Hicks for their feedback and review.

In the past few months, a new programming paradigm has rapidly gained favor in the forefront of Ethereum's development circles and many other corners of computing: writing code directly in very low-level languages (such as EVM bytecode, assembly language) or Lean, and using automatically verifiable mathematical proofs written in Lean to validate its correctness.

If done correctly, this not only has the potential to produce extremely efficient code but is also much safer than previous programming methods. Yoichi Hirai calls this the "ultimate form of software development."

This article will attempt to unveil the underlying principles, explore what formal verification of software can achieve, and identify its weaknesses and limitations in Ethereum and other fields.

What is Formal Verification?

Formal verification refers to the process of writing proofs for mathematical theorems in a way that can be automatically checked. To provide a relatively simple yet interesting example, let’s consider a basic theorem about the Fibonacci sequence: every third number is even, while the others are odd.

1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 ...

One simple way to prove this is through mathematical induction, moving forward three steps at a time.

First is the base case. Let F1 = F2 = 1, F3 = 2. By observation, we see that the statement ("Fi is even when i is a multiple of 3, otherwise it is odd") holds true before x = 3.

Next is the inductive case. Assume the statement holds true before 3k+3, meaning we already know the parity of F3k+1, F3k+2, and F3k+3 is odd, odd, and even, respectively. We can calculate the parity of the next group of three numbers:

F3k+4 = F3k+2 + F3k+3 = odd + even = odd
F3k+5 = F3k+3 + F3k+4 = even + odd = odd
F3k+6 = F3k+4 + F3k+5 = odd + odd = even

Thus, from knowing the statement holds true before 3k+3, we derive that the statement holds true before 3k+6. We can repeatedly apply this reasoning, assuring ourselves that this rule holds for all integers.

This argument is sufficient to convince humans. However, what if you want to prove something a hundred times more complex, and you want to be absolutely certain that you haven't made a mistake? Well, you can provide a proof that a computer can be convinced by.

Here is how it is presented:

-- Fibonacci with fib 0 = 0, fib 1 = 1, fib 2 = 1 (indices offset by 1)
def fib : Nat → Nat
  | 0     => 0
  | 1     => 1
  | n + 2 => fib (n + 1) + fib n

-- Claim: fib (3k+1) is odd, fib (3k+2) is odd, fib (3k+3) is even.
-- Equivalently: every third Fibonacci number starting from fib 3 is even.
-- We prove all three at once by induction on k, since each case
-- of the next block is built from the previous block.
theorem fib_triple (k : Nat) :
    fib (3 * k + 1) % 2 = 1 ∧
    fib (3 * k + 2) % 2 = 1 ∧
    fib (3 * k + 3) % 2 = 0 := by
  induction k with
  | zero => decide
  | succ k ih =>
    -- Rewrite the new indices into the form (something) + 2 so fib unfolds.
    refine ⟨?_, ?_, ?_⟩
    · show (fib (3 * k + 3) + fib (3 * k + 2)) % 2 = 1
      omega
    · show (fib (3 * k + 3) + fib (3 * k + 2) + fib (3 * k + 3)) % 2 = 1
      omega
    · show (fib (3 * k + 3) + fib (3 * k + 2) + fib (3 * k + 3)
      + (fib (3 * k + 3) + fib (3 * k + 2))) % 2 = 0
      omega

This is the same reasoning logic, but expressed in Lean. Lean is a programming language commonly used for writing and verifying mathematical proofs.

This looks different from the "human" proof given above for good reason: what is intuitive to a computer (in the traditional sense of "computer," meaning a "deterministic" program composed of if/then statements, rather than large language models) is fundamentally different from what is intuitive to humans.

In the proof above, you did not emphasize the fact that fib(3k+4) = fib(3k+3) + fib(3k+2), but rather emphasized that fib(3k+3) + fib(3k+2) is odd, while a strategy in Lean called omega automatically combines this with its knowledge of the definition of fib(3k+4).

In more complex proofs, you sometimes have to explicitly state which mathematical law allows you to take the current step, and sometimes you have to use obscure names like Prod.mk.inj.

On the other hand, you can expand huge polynomial expressions in a single step and prove their validity with just a single line expression like "omega" or "ring."

This non-intuitive and cumbersome nature largely explains why, despite machine-verifiable proofs existing for nearly 60 years, the field remains niche. However, on the other hand, due to the rapid development of artificial intelligence, many previously impossible things are now quickly becoming possible.

When Mathematical Proofs Start Guarding Code

So far, you might be thinking: well, computers can verify proofs of mathematical theorems, so we can finally determine which crazy new conclusions about primes are true and which are just errors in hundred-page PDF papers.

Maybe we can even figure out whether Shinichi Mochizuki's views on the ABC conjecture are correct!

But setting aside curiosity, so what?

There are many possible answers. But one answer that is very important to me is verifying the correctness of computer programs, especially those that perform cryptographic or security-related tasks.

After all, computer programs are mathematical objects, so proving that a computer program runs in a certain way is itself a mathematical theorem.

For example, suppose you want to prove whether an encrypted communication software like Signal is truly secure. You can write down what "secure" means mathematically in this context.

At a high level, what you are proving is that, assuming certain cryptographic assumptions hold, only those who possess the private key can know any information about the content of the message. In reality, there are many different security properties that are crucial.

It turns out that there is indeed a team trying to figure out exactly this issue! One of their security theorems looks like this:

theorem passive_secrecy_le_ddh
    (g : G)
    (adv : PassiveAdversary G SK) :
    passiveSecrecyAdvantage (F := F) g adv ≤
    ProbComp.boolDistAdvantage
      (DiffieHellman.ddhExpReal (F := F) g (ddhReduction adv))
      (DiffieHellman.ddhExpRand (F := F) g (ddhReduction adv))

Here is a summary of its meaning from Leanstral:

The passive_secrecy_le_ddh theorem is a compact reduction showing that the passive message confidentiality of X3DH is at least as hard as the DDH assumption in the random oracle model. If an adversary can break the passive message confidentiality of X3DH, then they can also break DDH.

Since we assume that DDH is hard to break, X3DH is also secure against passive attacks. This theorem proves that if an adversary can passively observe Signal's key exchange messages, they cannot distinguish the session key it produces from a random key with better than negligible probability.

If you combine this with a correct proof of the AES encryption implementation, you get a proof that the encryption of the Signal protocol is secure against passive attackers.

Similar projects have also proven that implementations of TLS and other parts of browser-internal cryptography are secure.

If you perform end-to-end complete formal verification, you are proving not just that some theoretical description of the protocol is secure, but that the specific code running by users is also secure in practice.

From the user's perspective, this greatly enhances trustlessness: to fully trust the code, you do not need to check the entire codebase; you only need to check the statements about it that have been proven.

Now, there are some important caveats to keep in mind, especially regarding what the critically important word "secure" actually means.

It is easy to forget to prove those truly important statements. It is easy to find that sometimes the statements to be proven are no simpler to describe than the code itself.

It is easy to inadvertently introduce assumptions into the proof that ultimately do not hold. It is also easy to decide that only one part of the system truly needs to be formally proven, only to be hit by serious vulnerabilities in other parts (even hardware).

Even the Lean implementation itself may have bugs. But before we discuss all these annoying details, let’s first delve into the utopia that could arise from correctly and ideally completing formal verification.

Formal Verification Born for Security

Bugs in computer code are terrifying.

When you put cryptocurrency into immutable chain smart contracts, and North Korea can automatically drain all your funds when a bug appears in the code and you have no recourse, bugs in code become even more terrifying.

When all this is wrapped in zero-knowledge proofs, bugs become even more terrifying because if someone manages to hack into the zero-knowledge proof system, they can extract all the money, and we have no idea what went wrong (worse, we don’t even know when it went wrong).

When we have powerful AI models, like Claude Mythos two years from now, that can automatically discover these bugs, bugs in code become even more terrifying.

Some people's reaction to this reality is to advocate for abandoning the fundamental idea of smart contracts, even believing that the internet cannot be a domain where defenders can have an asymmetric advantage over attackers.

Some quotes:

To harden a system, you need to spend more tokens than the attacker uses to exploit the vulnerabilities.

And:

Our industry is built on deterministic code. Writing it, testing it, deploying it, being confident it runs, but in my experience, this contract is breaking.

Among the top operators of truly AI-native companies, the codebase has become something you "trust" to run, and you can no longer specify its success probability precisely.

Worse still, some people believe the only solution is to abandon open source.

For cybersecurity, this would be a bleak future. Especially for those of us who care about the decentralization and freedom of the internet, this is an extremely pessimistic outlook.

The entire cypherpunk spirit is fundamentally built on the idea that on the internet, defenders have the advantage, and building a digital "castle" (whether through encryption, signatures, or proofs) is much easier than destroying one.

If we lose this, then internet security can only come from economies of scale, from hunting potential attackers worldwide, and, more broadly, can only be a choice between domination and destruction.

I disagree; I have a more optimistic vision for the future of cybersecurity.

I believe the challenges posed by powerful AI vulnerability-finding capabilities are severe, but they are a transitional challenge. Once the dust settles and we reach a new equilibrium, we will have an environment that is more favorable to defenders than in the past.

Mozilla agrees with my view. To quote them:

You may need to readjust the priority of everything else and devote sustained and focused energy to this task, but there is light at the end of the tunnel.

We are very proud of how our team is rising to this challenge, and others will too. Our work is not done, but we have weathered the storm and can glimpse a future that is not just barely keeping up but is much better.

Defenders finally have the opportunity to win decisively. ... Defects are limited, and we are entering a world where we can finally find them all.

Now, if you search for the words "formal" and "verification" in Mozilla's post using Ctrl+F, you will find zero matches. The positive future of cybersecurity does not entirely depend on formal verification or any other single technology.

What does it depend on? Essentially, this chart:

Trend of CVE vulnerabilities over time

For decades, many technologies have contributed to the decline in the number of vulnerabilities:

Type systems
Memory-safe languages
Improvements in software architecture (including sandboxing, permission control, and more broadly distinguishing "trusted computing base" from "other code")
Better testing methods
An ever-expanding knowledge base about secure and insecure coding patterns
An increasing number of pre-written and audited software libraries

AI-assisted formal verification should not be seen as a brand new paradigm but rather as a powerful accelerator of trends and paradigms that are already progressing.

Formal verification is not a panacea. But it is particularly well-suited for situations where the goal is much simpler than the implementation. This is especially true for some extremely complex and tricky technologies we will need to deploy in Ethereum's next major iteration: quantum-resistant signatures, STARKs, consensus algorithms, and ZK-EVMs.

STARK is a very complex piece of software. But the core security properties it implements are easy to understand and formalize: if you see a hash H pointing to program P, input x, and output y, then either (i) the hash algorithm used in STARK has been broken, or (ii) P(x) = y.

Thus we have the Arklib project, which is trying to create a fully formally verified STARK implementation (see VCV-io, which provides the foundational oracle computing infrastructure for formal verification of various other cryptographic protocols, many of which are dependencies of STARK).

More ambitiously, there is evm-asm: a project to build a fully formally verified entire EVM implementation.

The security properties here are not as straightforward: essentially, the goal is to prove its equivalence to another EVM implementation written in Lean, although that implementation can be written to maximize intuitiveness and readability without considering specific runtime efficiency.

It is possible that we will get ten EVM implementations, all provably equivalent, and they all happen to contain the same fatal flaw that allows an attacker to drain all ETH from addresses they are not authorized to access.

But this is much less likely than the possibility of such flaws existing in some current EVM implementation. Another security property that we only realized the importance of after painful lessons, namely resistance to DoS attacks, is also easy to formalize.

Two other important areas are:

Byzantine fault-tolerant consensus. Here, formalizing all expected security properties is equally difficult, but given how common bugs have been, it is worth a try. Thus, we have ongoing Lean implementations and proofs of consensus protocols in Lean.
Smart contract programming languages: see formal verification in Vyper and Verity.

In all these cases, one of the huge added values brought by formal verification is that these proofs are truly end-to-end. Typically, the most annoying bugs are interaction bugs that lurk at the interface of two subsystems considered independently.

For humans, reasoning about the entire system end-to-end is too difficult. But automated rule-checking systems can do it.

Formal Verification Born for Efficiency

Let’s take another look at evm-asm. This is an EVM implementation. But it is an EVM implementation written directly in RISC-V assembly.

Genuine.

Here is the ADD opcode:

import EvmAsm.Rv64.Program
namespace EvmAsm.Evm64
open EvmAsm.Rv64

/-- 256-bit EVM ADD: binary, pops 2, pushes 1.
    Limb 0: LD, LD, ADD, SLTU (carry), SD (5 instructions).
    Limbs 1-3: LD, LD, ADD, SLTU (carry1), ADD (carryIn), SLTU (carry2), OR (carryOut), SD (8 each).
    Then ADDI sp, sp, 32.
    Registers: x12=sp, x7=acc, x6=operand, x5=carry, x11=carry1. -/
def evm_add : Program :=
  -- Limb 0 (5 instructions)
  LD .x7 .x12 0 ;; LD .x6 .x12 32 ;;
  ADD .x7 .x7 .x6 ;; SLTU .x5 .x7 .x6 ;; SD .x12 .x7 32 ;;
  
  -- Limb 1 (8 instructions)
  LD .x7 .x12 8 ;; LD .x6 .x12 40 ;;
  ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;;
  ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;;
  OR' .x5 .x11 .x6 ;; SD .x12 .x7 40 ;;
  
  -- Limb 2 (8 instructions)
  LD .x7 .x12 16 ;; LD .x6 .x12 48 ;;
  ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;;
  ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;;
  OR' .x5 .x11 .x6 ;; SD .x12 .x7 48 ;;
  
  -- Limb 3 (8 instructions)
  LD .x7 .x12 24 ;; LD .x6 .x12 56 ;;
  ADD .x7 .x7 .x6 ;; SLTU .x11 .x7 .x6 ;;
  ADD .x7 .x7 .x5 ;; SLTU .x6 .x7 .x5 ;;
  OR' .x5 .x11 .x6 ;; SD .x12 .x7 56 ;;
  
  -- sp adjustment
  ADDI .x12 .x12 32
end EvmAsm.Evm64

The choice of RISC-V is because the ZK-EVM provers being built typically operate by proving RISC-V and compiling Ethereum clients to RISC-V. Therefore, if you have an EVM implementation written directly in RISC-V, this should be the fastest implementation you can get.

RISC-V can also be simulated very efficiently on ordinary computers (and there are RISC-V laptops available on the market).

Of course, to truly achieve end-to-end, you must formally verify the implementation of RISC-V itself (or the arithmetic of the prover), but don’t worry, work in this area already exists.

Writing code directly in assembly is something we used to do fifty years ago. Since then, we have abandoned this practice in favor of writing code in high-level languages.

High-level languages compromise on efficiency, but in exchange, they allow for much faster coding and, more importantly, much faster understanding of others' code, which is essential for security.

With the combination of formal verification and artificial intelligence, we have the opportunity to "go back to the future."

Specifically, we can have AI write assembly code and then write a formal proof to verify that the assembly code has the desired properties.

At a minimum, the desired properties can simply be perfect equivalence to an implementation that has been optimized for readability and written in some human-friendly high-level language.

We no longer need a single code object to balance readability and efficiency; instead, we have two independent objects: one (the assembly implementation) optimized solely for efficiency while considering the demands of its specific execution environment; the other (the security statement or high-level language implementation) optimized solely for readability, and then we prove the equivalence between the two through mathematical proof.

Users can (automatically) verify that proof once, and from then on, they only need to run the fast version.

This approach is incredibly powerful, and there is a reason why Yoichi Hirai calls it the "ultimate form of software development."

Formal Verification is Not a Silver Bullet

In the fields of cryptography and computer science, there is a tradition almost as old as the history of formal methods itself: the tradition of criticizing formal methods (or more broadly, the reliance on "proofs").

These writings are filled with practical cases. Let’s start with handwritten proofs from the early simple cryptography era, citing the criticisms of Menezes and Koblitz from 2004:

In 1979, Rabin proposed a cryptographic function that was "provably" secure in some sense, meaning it had a reductionist security property.

The reductionist security statement indicates that anyone who can find the message m from the ciphertext y must also be able to factor n. ... Shortly after Rabin proposed his encryption scheme, Rivest pointed out that ironically, this very feature that grants it additional security would lead to a complete collapse if faced with an attacker known as "chosen ciphertext."

That is, if the attacker can somehow trick Alice into decrypting their chosen ciphertext, then the attacker can follow the same steps that Sam used in the previous paragraph to factor n.

Menezes and Koblitz then provided more examples. The common pattern is that designs aimed at making encryption protocols more "provable" often make them less "natural," making them more likely to fail in ways the designers never even considered.

Now, let’s return to machine-verifiable proofs and code. Here is a 2011 paper that discovered vulnerabilities in a formally verified C compiler: paper:

The second CompCert problem we found manifests in two bugs that lead to the generation of the following code: stwu r1, -44432(r1) where a large PowerPC stack frame is being allocated.

The problem is that the 16-bit displacement field overflowed. CompCert's PPC semantics did not specify a limit on the width of this immediate value; they assumed the assembler would catch out-of-range values.

There is also a 2022 paper:

In CompCert-KVX, commit e2618b31 fixed a bug: the "nand" instruction was printed as "and"; "nand" was only used in the rare ~ (a & b) pattern. This bug was discovered by compiling randomly generated programs.

And today, in 2026, here is how Nadim Kobeissi describes vulnerabilities in formally verified software in Cryspen:

In November 2025, Filippo Valsorda independently reported that libcrux-ml-dsa v0.0.3 produced different public keys and signatures on different platforms given the same deterministic input.

The bug existed in the internal wrapping function _vxarq_u64, which implemented the XAR operation used in the Keccak-f permutation of SHA-3. The fallback mechanism passed incorrect parameters to the shift operation, corrupting the SHA-3 digest on ARM64 platforms without hardware SHA-3 support.

This falls under type I failure: the internal function was marked, but the entire NEON backend did not complete a proof of runtime safety or correctness.

And:

The libcrux-psq library implements a post-quantum pre-shared key protocol. In the decrypt_out method, the AES-GCM 128 decryption path calls .unwrap() on the decryption result instead of propagating errors. A malformed ciphertext can crash the process.

All four of these issues fall into one of the following two categories:

Cases where only part of the code was verified (because verifying the rest was too difficult), resulting in the discovery that the unverified code had more vulnerabilities than the authors imagined (and in more deadly ways).
Cases where the authors forgot to specify key properties that needed to be proven.

Nadim's article includes a classification of failure modes in formal verification; he also provides other types of failure modes (for example, another major case is "the formal specification itself is wrong, or the proof contains false statements quietly accepted by the constructed system").

Finally, we can look at failures of formal verification at the boundary of software and hardware. A common issue here is verifying resistance to side-channel attacks.

Even if you have perfectly secure cryptographic forms to protect your messages, if someone a few meters away can capture fluctuations in electrical signals and extract your private key after hundreds of thousands of encryptions, you are still insecure.

This is an article about "differential power analysis," a well-understood example of such techniques: article.

Differential power analysis is a common type of side-channel attack. Source: Wikipedia

There have always been attempts to prove security against such attackers. However, any such proof requires some mathematical model of the attacker that allows you to prove security against it.

Sometimes a "d-probing model" is used: we assume that the number of locations the attacker can query in the circuit has a known limit. However, some forms of leakage are not captured by this model.

As observed in this article, a common issue is transient leakage: if you can observe a signal that depends not only on the value at a certain location but also on how that value changes, this is often enough to recover the information you need from two values (the old and new values) rather than just one value.

This article provides classifications of other forms of leakage.

For decades, these criticisms of formal verification have helped improve formal verification. Compared to the past, we are now better at guarding against such issues. But even today, it is not perfect.

Looking at the big picture, there is a main thread here. Formal verification is powerful.

But no matter how marketing terms make formal verification sound like it gives you "provable correctness," the so-called "provable correctness" fundamentally does not prove that software (or hardware) is "correct."

By most human understandings, "correct" means something like: "the behavior of things aligns with the user's understanding of the developer's intent."

And "secure" means something like: "the behavior of things does not violate the user's expectations and does not do things detrimental to the user's interests."

In both cases, correctness and security boil down to a comparison between mathematical objects and human intentions or expectations.

Human intentions and expectations are mathematically complex objects in their own right; after all, the human brain is part of the universe, following physical laws that can be simulated if you have enough computational power.

But they are incredibly complex mathematical objects that neither computers nor we ourselves can understand or even read.

For all practical intents and purposes, they are black boxes; we only have any understanding of our intentions and expectations because each of us has years of experience observing our thoughts and inferring the thoughts of others.

And because we cannot stuff raw human intentions into a computer, formal verification cannot prove a comparison with human intentions.

Therefore, "provable correctness" and "provable security" do not actually prove the "correctness" and "security" that we humans understand. Nothing can do that unless we can fully simulate the human brain.

So What Is It Good For?

I tend to view test suites, type systems, and formal verification as different implementations of the same underlying approach to programming language safety (which may also be the only reasonable approach).

They are all about redundantly specifying our intentions in different ways and then automatically checking whether these different specifications are compatible with each other.

Take this Python code as an example:

def fib(n: int) -> int:
    if n < 0:
        raise Exception("Negative values not supported")
    elif 0 <= n < 2:
        return n
    else:
        return fib(n-1) + fib(n-2)

if __name__ == '__main__':
    assert [fib(i) for i in range(10)] == [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
    assert fib(15) == 610

Here, you express your intentions in three different ways:

Explicitly, by implementing the Fibonacci formula in code
Implicitly, through the type system (specifying that inputs, outputs, and intermediate steps in recursion are all integers)
Through the "sample package" method: test cases

Running the file will check the formula against the samples. The type checker can verify whether the types are compatible: adding two integers is a compliant operation and will produce another integer.

Type systems are often a good way to check work in physics: if you are calculating acceleration but end up with an answer in meters/second instead of meters/second², you know you made a mistake.

And test cases are an instance of the "sample package" definition, which is often a more natural way for humans to handle concepts than direct explicit definitions.

The more different ways you can specify your intentions, ideally in ways that require you to think differently about the problem, the more likely you are to actually express what you truly want once all these expressions are proven to be compatible with each other.

Secure programming is about expressing your intentions in multiple different ways and then automatically verifying whether all these expressions are compatible with each other.

Formal verification allows you to extend this approach further. Through formal verification, you can specify your intentions in an almost infinite number of different redundant ways, and the program can only be validated if they are all compatible.

You can specify a highly optimized implementation and a very inefficient but human-readable implementation and verify that they match. You can ask your ten friends to provide a list of mathematical properties they think your program should have and then check if it passes them all.

If it doesn’t pass, find out whether the program is wrong or the mathematical properties are incorrectly specified. And you can use AI to perform all these operations extremely efficiently.

So How Do I Get Started?

Realistically, you won’t be writing proofs yourself. The reason formal methods have never gained popularity is that most people cannot figure out how to write these obscure things. Can you tell me what the following code means?

  /-- Helper: pointwise ≤ at the foldl level with an accumulator. -/
private theorem foldl_acc_le (ds1 ds2 : List Nat) (w : Nat) (a b : Nat) (hAcc : a ≤ b)
    (hLE : Forall₂ (· ≤ ·) ds1 ds2) :
    List.foldl (λ acc d => acc * w + d) a ds1 ≤
    List.foldl (λ acc d => acc * w + d) b ds2 := by
  match ds1, ds2, hLE with
  | [], [], .nil => exact hAcc
  | d1::ds1', d2::ds2', .cons hd htl =>
    simp [List.foldl]
    refine foldl_acc_le ds1' ds2' w (a * w + d1) (b * w + d2) ?_ htl
    exact Nat.add_le_add (Nat.mul_le_mul hAcc (Nat.le_refl _)) hd

(If you are wondering, this is one of many sub-lemmas in the proof of a specific security statement for a variant of SPHINCS signatures.

Specifically, the statement is: unless a hash collision occurs, the signature of a message generated from one hash digest (dig1) will require a higher value at least somewhere on a hash ladder than the signature of any other message, thus containing information that cannot be computed from that other signature.)

You don’t need to manually write code and proofs; you just need to let AI write programs for you (whether directly in Lean or for speed in assembly language) and prove any desired properties in the process.

The advantage of this task is that it is self-validating, so you don’t need to supervise it; you just let the AI run continuously for several hours.

The worst outcome is that it spins its wheels without making progress (or, as my leanstral once did, it replaces the statement it was asked to prove to lighten its workload).

The only thing you need to check in the end is whether the statements it proved meet your requirements.

In the case of the SPHINCS signature variant, this is the final statement:

theorem wots_fullDigits_incomparable
    {dig1 dig2 : List Nat} {w l1 l2 : Nat}
    (hw : 0 < w)
    (hLen1 : dig1.length = l1) (hLen2 : dig2.length = l1)
    (hBound1 : ∀ d ∈ dig1, d < w) (hBound2 : ∀ d ∈ dig2, d < w)
    (hL2suff : l1 * (w - 1) < w ^ l2)
    (hNeq : dig1 ≠ dig2) :
    ¬ Forall₂ (· ≤ ·) (wotsFullDigits dig1 w l1 l2) (wotsFullDigits dig2 w l1 l2) ∧
    ¬ Forall₂ (· ≤ ·) (wotsFullDigits dig2 w l1 l2) (wotsFullDigits dig1 w l1 l2)

This is actually on the edge of being barely readable:

If the numbers generated from one hash digest (dig1) are not equal to those generated from another hash digest (dig2)

Then neither of the following two conditions holds:

For all numbers, the numbers from dig1 <= the numbers from dig2
For all numbers, the numbers from dig2 <= the numbers from dig1

In the "extended numbers" (wotsFullDigits) generated by adding checksums. That is, in the extension of dig1, there will inevitably be places where the numbers are higher, while in other places, the numbers in the extension of dig2 will be higher.

In terms of using large language models to write proofs, I find both Claude and Deepseek 4 Pro capable. Leanstral is a smaller open-source weight model specifically fine-tuned for writing Lean, and it is a promising alternative.

It has 119B parameters, activating 6B per token, and you can run it locally, although it is slower (about 15 tok/sec on my laptop). According to benchmarks, Leanstral outperforms much larger general models:

Based on my current personal experience, it is slightly less effective than Deepseek 4 Pro but still very effective.

Formal verification cannot solve all our problems.

However, if we want the model of internet security to no longer be built on trusting a few powerful organizations, we must turn to trusting code, which includes trusting code even in the face of powerful AI adversaries.

AI-assisted formal verification has taken us a solid step toward achieving this goal.

Like blockchain and ZK-SNARKs, artificial intelligence and formal verification are also highly complementary technologies.

Blockchain grants you open verifiability and censorship resistance at the cost of privacy and scalability, while ZK-SNARKs return privacy and scalability to you (in fact, even more than you had before).

Artificial intelligence grants you the ability to write vast amounts of code at the cost of accuracy, while formal verification returns accuracy to you (in fact, even more than you had before).

By default, AI will spawn a large amount of extremely hasty code, and the number of bugs will increase.

In fact, in some cases, tolerating an increase in bugs is the correct trade-off: if the bugs are minor, then even software with bugs is better than no software at all.

But here, cybersecurity has an optimistic future: software will (continue to) split into "unsafe edge parts" around a "secure core."

Unsafe edge parts will run in sandboxes, granted only the minimum permissions necessary to complete their tasks.

The secure core will manage everything. If the secure core crashes, everything will crash, including your personal data, your money, and so on. But if an unsafe edge part crashes, the secure core can still protect you.

When it comes to the secure core, we cannot allow buggy code to proliferate. We will take radical action to keep the secure core small and even shrink it further.

Instead, we will invest all the extra performance brought by AI into making the secure core more secure, enabling it to bear the extremely high trust burdens we place on it in a highly digital society.

The kernel of an operating system (or at least part of it) will become such a secure core.

Ethereum will be another.

Hopefully, at least for all non-performance-intensive computations, the hardware you use will become a third.

Systems related to the Internet of Things will be the fourth.

At least among these secure cores, the old adage "bugs are inevitable; you can only try to find them before the attacker does" will be disproven, replaced by a more hopeful world where you will achieve true security.

But if you are willing to hand over your assets and data to software that is poorly written and may accidentally swallow them into a black hole, well, you certainly have that freedom too.

-- Price

Popular coins

Latest Crypto News

15:42

Data: The current whale position on the Hyperliquid platform is 4.278 billion USD, with a long-short position ratio of 0.97

According to Coinglass data, the current whale holdings on the Hyperliquid platform amount to $4.278 billion, with long positions at $2.106 billion, accounting for 49.23% of the holdings, and short positions at $2.172 billion, accounting for 50.77%. The profit and loss for long positions is -$4.4909...

HYPE

15:42

Data: A certain whale shorted 375,000 HYPE, currently facing an unrealized loss of 1.989 million USD

According to on-chain analyst Ai Yi (@ai_9684xtpa), a large holder has been shorting 375,000 HYPE since May 15, with an entry average price of $43.298. As the price of HYPE broke through $48, this position currently has an unrealized loss of $1.989 million, with a position size of approximately $18....

15:42

Indonesian central bank governor: The Federal Reserve may not cut interest rates and could raise them next year

According to Jinshi reports, the governor of Indonesia's central bank stated that some central banks have begun to raise interest rates. He pointed out that the Federal Reserve may not cut interest rates this year and may even raise rates next year.

15:42

The euro stablecoin project Qivalis has received support from 37 banks

According to the Financial Times, Amsterdam-based Qivalis has secured support from 37 European banks for its yet-to-be-launched stablecoin, including BNP Paribas, ING Group, and UniCredit. The newly added 25 banks include ING Bank, Banco di San Paolo, and Rabobank.As executives and policymakers incr...

15:42

Ju.com Foundation strategically invests $10 million in SuiXi, leading the "AI + Blockchain" e-commerce sector

Ju.com Foundation announced a formal strategic investment of $10 million in the blockchain value e-commerce platform SuiXi, further enhancing the integration of "AI + blockchain" applications and the layout of Web3 consumer scenarios.This round of funding will focus on upgrading on-chain traceabilit...

Vitalik: What is the key to the next phase of Ethereum?

What is Formal Verification?

When Mathematical Proofs Start Guarding Code

Formal Verification Born for Security

Formal Verification Born for Efficiency

Formal Verification is Not a Silver Bullet

So What Is It Good For?

So How Do I Get Started?

-- Price

You may also like

Recovering cryptocurrency assets is a lucrative business that quietly makes a fortune

Gemini 3.5 is here! Tonight, Google personally eliminates Google

Duan Yongping establishes a position in a cryptocurrency company for the first time: Why Circle?

Interlace: A global leader in Agentic Payment and stablecoin infrastructure platform, building the next generation of digital financial foundation

Morning Report | Musk's xAI launches Skills; Duan Yongping to first build position in Circle in Q1 2026; Polymarket partners with Nasdaq to launch prediction market

Dialogue with Lead Bank Founder Jackie: American Banks Re-embrace Crypto

Vitalik: What we need to do is not to fight against AI, but to create a sanctuary

Morning News | VanEck and Grayscale submitted BNB ETF amendments on the same day; BlackRock discusses investing billions of dollars in SpaceX's IPO; Michael Saylor releases Bitcoin Tracker information again

Crypto ETF Weekly | Last week, the net outflow of Bitcoin spot ETFs in the United States was $995 million; the net outflow of Ethereum spot ETFs in the United States was $255 million

This Week's News Preview | The Federal Reserve Releases the Last FOMC Minutes of the "Powell Era"

Blockchain Capital Partner: Most people's understanding of on-chain economy is narrow

The ambition of "one account trading global assets": How does CoinUp.io break down asset barriers to become an industry dark horse?

How long will it take for the GPU futures market when computing power is commoditized?

Harvard University loses $150 million in cryptocurrency! Has completely liquidated Ethereum and significantly reduced its Bitcoin ETF positions

BNB Chain releases a research report exploring the migration path of BSC to post-quantum cryptography

After the number of developers was halved: Crypto is not dead, it has just handed over talent to AI

"JUST 6th Anniversary x GasFree Super Carnival Month" is here: Enjoy "0" Gas transfer freedom and share a prize pool of 10,000 USDT

Morning News | AEON completes $8 million Pre-Seed round financing led by YZi Labs; Goldman Sachs liquidates XRP and Solana ETF holdings in Q1; Strategy increased its holdings by 24,869 BTC last week

Recovering cryptocurrency assets is a lucrative business that quietly makes a fortune

Gemini 3.5 is here! Tonight, Google personally eliminates Google

Duan Yongping establishes a position in a cryptocurrency company for the first time: Why Circle?

Interlace: A global leader in Agentic Payment and stablecoin infrastructure platform, building the next generation of digital financial foundation

Morning Report | Musk's xAI launches Skills; Duan Yongping to first build position in Circle in Q1 2026; Polymarket partners with Nasdaq to launch prediction market

Dialogue with Lead Bank Founder Jackie: American Banks Re-embrace Crypto

Contents

Popular coins

Latest Crypto News

Data: The current whale position on the Hyperliquid platform is 4.278 billion USD, with a long-short position ratio of 0.97

Data: A certain whale shorted 375,000 HYPE, currently facing an unrealized loss of 1.989 million USD

Indonesian central bank governor: The Federal Reserve may not cut interest rates and could raise them next year

The euro stablecoin project Qivalis has received support from 37 banks

Ju.com Foundation strategically invests $10 million in SuiXi, leading the "AI + Blockchain" e-commerce sector