Scan the QR code to download the mobile APP

Select language and region

In-depth Analysis of Two ZK Vulnerabilities

IntermediateJun 05, 2024

This article provides an in-depth analysis of two potential vulnerabilities in Zero-Knowledge Proof (ZKP) systems: the "Load8 Data Injection Attack" and the "Forgery Return Attack." The article details the technical specifics of these vulnerabilities, how they can be exploited, and the methods for fixing them. Additionally, it discusses the lessons learned from discovering these vulnerabilities during the auditing and formal verification processes of ZK systems and suggests best practices for ensuring the security of ZK systems.

Blockchain Technology

In-depth Analysis of Two ZK Vulnerabilities

In a previous article, we discussed the advanced formal verification of Zero-Knowledge Proof (ZKP) systems: how to verify a ZK instruction. By formally verifying each zkWasm instruction, we can fully ensure the technical security and correctness of the entire zkWasm circuit. In this article, we will focus on the perspective of vulnerability discovery, analyzing specific vulnerabilities identified during the auditing and verification processes, and the lessons learned from them. For a general discussion on advanced formal verification of ZKP blockchains, please refer to the article on advanced formal verification of ZKP blockchains.

Before discussing ZK vulnerabilities, let’s first understand how CertiK performs ZK formal verification. For complex systems like the ZK Virtual Machine (zkVM), the first step in formal verification (FV) is to clearly define what needs to be verified and its properties. This requires a comprehensive review of the ZK system’s design, code implementation, and testing setup. This process overlaps with regular audits but differs in that, after the review, the verification goals and properties must be established. At CertiK, we refer to this as audit-oriented verification. Auditing and verification work are usually integrated. For zkWasm, we conducted both auditing and formal verification simultaneously.

What Is a ZK Vulnerability?

The core feature of Zero-Knowledge Proof (ZKP) systems is that they allow the transfer of a brief encrypted proof of computations executed offline or privately (such as blockchain transactions) to a ZKP verifier. The verifier checks and confirms the proof to ensure that the computation occurred as claimed. In this context, a ZK vulnerability would enable a hacker to submit forged ZK proofs for false transactions and have them accepted by the ZKP verifier.

For a zkVM prover, the ZK proof process involves running a program, generating execution records for each step, and converting these execution records into a set of numerical tables (a process known as “arithmetization”). These numbers must satisfy a set of constraints (the “circuit”), which include the relationships between specific table cells, fixed constants, database lookup constraints between tables, and polynomial equations (or “gates”) that each pair of adjacent table rows must satisfy. On-chain verification can confirm the existence of a table that meets all constraints without revealing the specific numbers within the table.

Arithmetization Table in zkWasm

The accuracy of each constraint is crucial. Any error in a constraint, whether it is too weak or missing, could allow a hacker to submit misleading proof. These tables may appear to represent a valid execution of a smart contract but do not in reality. Compared to traditional VMs, the opacity of zkVM transactions amplifies these vulnerabilities. In non-ZK chains, the details of transaction computations are publicly recorded on the blockchain; however, zkVMs do not store these details on-chain. This lack of transparency makes it difficult to determine the specifics of an attack or even if an attack has occurred.

The ZK circuit that enforces the execution rules of zkVM instructions is extremely complex. For zkWasm, the implementation of its ZK circuit involves over 6,000 lines of Rust code and hundreds of constraints. This complexity often means that there could be multiple vulnerabilities waiting to be discovered.

zkWasm Circuit Architecture

Indeed, through the auditing and formal verification of zkWasm, we have discovered several such vulnerabilities. Below, we will discuss two representative examples and examine the differences between them.

Code Vulnerability: Load8 Data Injection Attack

The first vulnerability involves the Load8 instruction in zkWasm. In zkWasm, heap memory reads are accomplished using a set of LoadN instructions, where N is the size of the data to be loaded. For example, Load64 should read 64-bit data from a zkWasm memory address. Load8 should read 8-bit data (i.e., one byte) from memory and pad it with zeros to create a 64-bit value. Internally, zkWasm represents memory as an array of 64-bit bytes, so it needs to “select” a part of the memory array. This is done using four intermediate variables (u16_cells), which together form the complete 64-bit load value.

The constraints for these LoadN instructions are defined as follows:

This constraint is divided into three cases: Load32, Load16, and Load8. Load64 has no constraints because memory units are exactly 64 bits. For the Load32 case, the code ensures that the high 4 bytes (32 bits) in the memory unit must be zero.

For the Load16 case, the code ensures that the high 6 bytes (48 bits) in the memory unit must be zero.

For the Load8 case, it should require that the high 7 bytes (56 bits) in the memory unit be zero. Unfortunately, this is not the case in the code.

As you can see, only the high 9 to 16 bits are restricted to zero. The other high 48 bits can be any value and still pass as “read from memory.”

Exploiting this vulnerability, a hacker could tamper with a legitimate execution sequence’s ZK proof, causing the Load8 instruction to load these unexpected bytes, resulting in data corruption. Moreover, through careful arrangement of surrounding code and data, it might trigger false executions and transfers, leading to data and asset theft. These forged transactions could pass the checks of zkWasm checkers and be incorrectly recognized as legitimate transactions by the blockchain.

Fixing this vulnerability is actually quite simple.

This vulnerability represents a class of ZK vulnerabilities called “code vulnerabilities” because they originate from the code’s implementation and can be easily fixed through minor local code modifications. As you might agree, these vulnerabilities are also relatively easier for people to identify.

Design Vulnerability: Forged Return Attack

Let’s take a look at another vulnerability, this time concerning the invocation and return of zkWasm. Invocation and return are fundamental VM instructions that allow one running context (e.g., a function) to call another and resume the execution of the calling context after the callee completes its execution. Each invocation expects a return later on. The dynamic data tracked by zkWasm during the lifecycle of invocation and return is known as the “call frame.” Since zkWasm executes instructions sequentially, all call frames can be ordered based on their occurrence during runtime. Below is an example of invocation/return code running on zkWasm.

Users can call the buy_token() function to purchase tokens (possibly by payment or transfer of other valuable items). One of its core steps is to increase the token account balance by 1 by calling the add_token() function. Since the ZK prover itself does not support the call frame data structure, the Execution Table (E-Table) and Jump Table (J-Table) are needed to record and track the complete history of these call frames.

The above figure illustrates the process of buy_token() calling add_token() and returning from add_token() to buy_token(). It can be seen that the token account balance increases by 1. In the Execution Table, each execution step occupies one row, including the current call frame number being executed, the current context function name (for illustration purposes only), the number of the current running instruction within the function, and the current instruction stored in the table (for illustration purposes only). In the Jump Table, each call frame occupies one row, storing the number of its caller frame, the caller function context name (for illustration purposes only), and the next instruction position of the caller frame (so that the frame can return). In both tables, there is a “jops” column that tracks whether the current instruction is a call/return (in the Execution Table) and the total number of call/return instructions for that frame (in the Jump Table).

As expected, each call should have a corresponding return, and each frame should have only one call and one return. As shown in the figure above, the “jops” value for the first frame in the Jump Table is 2, corresponding to the first and third rows in the Execution Table, where the “jops” value is 1. Everything looks normal at the moment.

However, there is an issue here: while one call and one return will increase the “jops” count for the frame to 2, two calls or two returns will also result in a count of 2. Having two calls or two returns per frame may seem absurd, but it’s important to remember that this is exactly what a hacker would try to do by breaking expectations.

You might be feeling excited now, but have we really found the problem?

It turns out that two calls are not a problem because the constraints of the Execution Table and the Jump Table prevent two calls from being encoded into the same row of a frame because each call generates a new frame number, i.e., the current call frame number plus 1.

However, the situation is not so fortunate for two returns: since no new frame is created upon return, a hacker could indeed obtain the Execution Table and Jump Table of a legitimate execution sequence and inject forged returns (and corresponding frames). For example, the previous Execution Table and Jump Table example of buy_token() calling add_token() could be tampered with by a hacker to the following scenario:

The hacker injected two forged returns between the original call and return in the Execution Table and added a new forged frame row in the Jump Table (the original return and subsequent instruction execution steps in the Execution Table need to be incremented by 4). Since the “jops” count for each row in the Jump Table is 2, the constraints are satisfied, and the zkWasm proof checker would accept the “proof” of this forged execution sequence. As seen from the figure, the token account balance increases 3 times instead of 1. Therefore, the hacker could obtain 3 tokens for the price of 1.

There are various ways to address this issue. One obvious approach is to separately track calls and returns and ensure that each frame has exactly one call and one return.

You may have noticed that so far we haven’t shown a single line of code for this vulnerability. The primary reason is that there is no problematic line of code; the code implementation perfectly aligns with the table and constraint designs. The issue lies within the design itself, and so does the fix.

You might think this vulnerability should be obvious, but in reality, it’s not. This is because there is a gap between “two calls or two returns will also result in a ‘jops’ count of 2” and “two returns are actually possible.” The latter requires a detailed and comprehensive analysis of various constraints in the Execution Table and the Jump Table, making it challenging to perform complete informal reasoning.

Comparison of Two Vulnerabilities

The “Load8 Data Injection Vulnerability” and the “Forgery Return Vulnerability” both have the potential to allow hackers to manipulate ZK proofs, create false transactions, deceive proof checkers, and engage in theft or hijacking. However, their nature and the way they were discovered are quite different.

The “Load8 Data Injection Vulnerability” was discovered during the audit of zkWasm. This was no easy task, as we had to review hundreds of constraints in over 6,000 lines of Rust code and dozens of zkWasm instructions. Despite this, the vulnerability was relatively easy to detect and confirm. Since this vulnerability was fixed before formal verification began, it was not encountered during the verification process. If this vulnerability had not been discovered during the audit, we could expect it to be found during the verification of the Load8 instruction.

The “Forgery Return Vulnerability” was discovered during formal verification after the audit. One reason we failed to discover it during the audit is that zkWasm has a mechanism similar to “jops” called “mops,” which tracks memory access instructions corresponding to historical data for each memory unit during zkWasm runtime. The constraints on mops counts are indeed correct because it only tracks one type of memory instruction, memory writes, and each memory unit’s historical data is immutable and written only once (mops count is 1). But even if we had noticed this potential vulnerability during the audit, we would still have been unable to easily confirm or rule out every possible scenario without rigorous formal reasoning on all relevant constraints, as there is actually no line of code that is incorrect.

In summary, these two vulnerabilities belong to the categories of “code vulnerabilities” and “design vulnerabilities,” respectively. Code vulnerabilities are relatively straightforward, easier to discover (faulty code), and easier to reason about and confirm. Design vulnerabilities can be very subtle, more difficult to discover (no “faulty” code), and harder to reason about and confirm.

Best Practices for Discovering ZK Vulnerabilities

Based on our experience auditing and formally verifying zkVM and other ZK and non-ZK chains, here are some suggestions on how to best protect ZK systems:

Checking the code and design

As mentioned earlier, both the code and design of ZK can contain vulnerabilities. Both types of vulnerabilities can potentially compromise the integrity of the ZK system, so they must be addressed before the system is put into operation. One issue with ZK systems, compared to non-ZK systems, is that any attacks are harder to detect and analyze because their computational details are not publicly available or retained on the chain. As a result, people may be aware that a hacker attack has occurred, but they may not know the technical details of how it happened. This makes the cost of any ZK vulnerability very high. Consequently, the value of ensuring the security of ZK systems in advance is also very high.

Conduct audits and formal verification

The two vulnerabilities we discussed here were discovered through auditing and formal verification. Some might assume that formal verification alone obviates the need for auditing since all vulnerabilities would be detected through formal verification. However, our recommendation is to perform both. As explained at the beginning of this article, high-quality formal verification work begins with a thorough review, inspection, and informal reasoning about the code and design, which overlaps with auditing. Additionally, finding and resolving simpler vulnerabilities during auditing can simplify and streamline the formal verification process.

If you’re conducting both an audit and formal verification for a ZK system, the optimal approach is to perform both simultaneously. This enables auditors and formal verification engineers to collaborate efficiently, potentially discovering more vulnerabilities as high-quality auditing inputs are required for formal verification.

If your ZK project has already undergone auditing (kudos) or multiple audits (big kudos), our suggestion is to perform formal verification on the circuit based on the previous audit results. Our experience with auditing and formal verification in projects like zkVM and others, both ZK and non-ZK, has repeatedly shown that formal verification often captures vulnerabilities missed during auditing. Given the nature of ZKP, while ZK systems should offer better blockchain security and scalability than non-ZK solutions, the criticality of their security and correctness is much higher than traditional non-ZK systems. Therefore, the value of high-quality formal verification for ZK systems far exceeds that of non-ZK systems.

Ensure the security of circuits and smart contracts

ZK applications typically consist of two components: circuits and smart contracts. For applications based on zkVM, there are universal zkVM circuits and smart contract applications. For applications not based on zkVM, there are application-specific ZK circuits along with corresponding smart contracts deployed on the L1 chain or on the other end of a bridge.

Our audit and formal verification efforts for zkWasm only involve the zkWasm circuit. However, from the perspective of overall security for ZK applications, it’s crucial to audit and formally verify their smart contracts as well. After all, it would be regrettable if, after investing significant effort into ensuring circuit security, laxity in smart contract scrutiny led to application compromise.

Two types of smart contracts deserve particular attention. The first type directly handles ZK proofs. Though they may not be large in scale, their risk is exceptionally high. The second type comprises medium to large-scale smart contracts running on top of zkVM. We know that these contracts can sometimes be highly complex, and the most valuable ones should undergo auditing and verification, especially since their execution details are not visible on-chain. Fortunately, after years of development, formal verification for smart contracts is now practical and prepared for appropriate high-value targets.

Let’s summarize the impact of formal verification (FV) on ZK systems and their components with the following points.

Statement:

This article is reproduced from [panewslab], the copyright belongs to the original author [CertiK], if you have any objections to the reprint, please contact the Gate Learn team, and the team will handle it as soon as possible according to relevant procedures.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team and are not mentioned in Gate.io, the translated article may not be reproduced, distributed or plagiarized.

What Is a ZK Vulnerability?

Code Vulnerability: Load8 Data Injection Attack

Design Vulnerability: Forged Return Attack

Comparison of Two Vulnerabilities

Best Practices for Discovering ZK Vulnerabilities

In-depth Analysis of Two ZK Vulnerabilities

IntermediateJun 05, 2024

Blockchain Technology

What Is a ZK Vulnerability?

Code Vulnerability: Load8 Data Injection Attack

Design Vulnerability: Forged Return Attack

Comparison of Two Vulnerabilities

Best Practices for Discovering ZK Vulnerabilities

What Is a ZK Vulnerability?

Arithmetization Table in zkWasm

zkWasm Circuit Architecture

Code Vulnerability: Load8 Data Injection Attack

The constraints for these LoadN instructions are defined as follows:

For the Load16 case, the code ensures that the high 6 bytes (48 bits) in the memory unit must be zero.

For the Load8 case, it should require that the high 7 bytes (56 bits) in the memory unit be zero. Unfortunately, this is not the case in the code.

As you can see, only the high 9 to 16 bits are restricted to zero. The other high 48 bits can be any value and still pass as “read from memory.”

Fixing this vulnerability is actually quite simple.

Design Vulnerability: Forged Return Attack

You might be feeling excited now, but have we really found the problem?

There are various ways to address this issue. One obvious approach is to separately track calls and returns and ensure that each frame has exactly one call and one return.

Comparison of Two Vulnerabilities

Best Practices for Discovering ZK Vulnerabilities

Based on our experience auditing and formally verifying zkVM and other ZK and non-ZK chains, here are some suggestions on how to best protect ZK systems: