# Concretely Efficient Input Transformation Based Zero-Knowledge Argument System for Arbitrary Circuits 

Frank Y.C. Lu<br>YinYao Inc.


#### Abstract

We introduce a new efficient, transparent, interactive zeroknowledge argument system that is based on the new input transformation concept that we will introduce in this paper. The core of this concept is a mechanism that converts input parameters into a format that can be processed directly by the circuit so that the circuit output can be verified through direct computation of the circuit. In the default setting, our protocol only requires the prover to use vector commitment to commit to the square root of the polynomial degree $\left(\sqrt{p_{d}}\right)$ the circuit generates. Our benchmark result shows our approach can significantly improve both prover runtime and verifier runtime performance over state-of-the-art by over one order of magnitude while keeping the communication cost comparable with that of the state-of-the-art. Our approach also allows our protocol to be memory-efficient without forcing it to require a designated verifier. The theoretical memory cost of our protocol is $O(b)$, where $b$ is a parameter set by the user. Lowering the $b$ value will result in better prover runtime performance at the expense of higher communication cost. Our benchmark result shows the prover speed of our protocol is at least comparable to that of state-of-the-art VOLE-based protocols, but with much smaller proof size and the significant advantage of being non-interactive at the same time.


## 1 Introduction

Ever since the discoveries of interactive proofs (IPs) [23] and probabilistically checkable proofs (PCPs) [6 [5] 4] 33 in the late last century, there has been a tremendous amount of research in the area of proof systems. More recently, the rise of blockchain and Web3 has finally triggered real-world interest in zeroknowledge systems.

Due to the expensive computation cost in the setup phase of earlier SNARKs (Succinct Non-Interactive Argument of Knowledge), the industry developed protocols that have the structured reference string (SRS) be constructible in a "universal and updatable" fashion. The first such universal SNARK was in Groth et al. [24, and Maller et al. improved the SRS size from quadratic to linear in Sonic [27]. More recently developed protocols such as PLONK [21, MARLIN [16] are universal fully-succinct SNARK with significantly improved prover
runtime compared to the fully-succinct Sonic. However, many of these universal succinct SNARKs systems require trusted setup, and the prover run-time of these protocols is prohibitively expensive even with the latest improvements such as HyperPlonk [15, usually takes over 100 seconds on a single-threaded CPU for a circuit with over $2^{20}$ constraints.

Other classes of protocols including the Goldwasser, Kalai, and Rothblum (GKR) class such as Hyrax [32, Virgo 37]; MPC-in-the-head class of Kushilevitz, Ostrovsky, and Sahai such as ZKBoo [22] and Ligero/Ligero++ [1] 9] offer efficient prover runtimes that are at least one order of magnitude more efficient than pairing-based SNARKs. However, these protocols are largely ignored by the industry (e.g., the blockchain community) due to their expensive verifier runtime and high communication cost (hundreds of KBs) compared to fully succinct SNARK and STARK 8 protocols.

NIZKs such as SpartanNIZK [30] and later Lakonia 31] seem to offer a much more balanced approach, where they offer efficient prover runtime (6-18 seconds single thread) and competitive communication costs for large circuits while not being layer dependent. However, the downside of these protocols is that their verifier performance is still expensive, usually in the $400+\mathrm{ms}$ range on a singlethreaded CPU.

Another category of protocols emphasizes on memory-efficiency such as garbled circuits [26] [20] [25] and Vector Oblivious Linear Evaluation (VOLE) protocols [11] [29] [13] [12] 36] [33] [7] 35] generally offer better prover performance. However, their verifier runtimes are just as expensive and generally require a designated verifier with very expensive communication cost.

Our aim is to create a new transparent zero-knowledge protocol that offers great flexibility to optimize and the best overall performance while free of significant performance shortcoming in any area. Specifically, we want to keep the communication cost comparable to those of the state-of-the-art and greatly improve both the prover runtime and the verifier runtime over those of the state-of-the-art. Finally, we also want our new system to be memory-efficient without requiring a designated verifier like that of VOLE protocols.

## 2 Summary of Contributions

Our approach is to design a new class of protocols that allows verifiers to validate circuit outputs by directly examining circuit inputs without going through some intermediate translation phase. In our protocol, circuit inputs in the Pedersen commitment form are converted to linear polynomials in $\mathbb{F}_{q}$ so that verifiers can use standard arithmetic operations in $\mathbb{Z}_{q}$ to just "execute" the evaluating circuit to get its output. Our protocol does not require trusted setup and depends only on discrete logarithm assumption.

There were past attempts that somewhat enabled verifiers to "validate" each multiplication gate on its own, such as Cramer and Damgård [17] and more recent designated-verifier (which is a limitation itself) VOLE (LPZK in particular) [19] 35] [18] 34 based protocols. In these older strategies, each multiplica-
tion gate computation is actually not computed but "confirmed" by the verifier using transcripts tied to each multiplication gate. As a result, the communication/verifier costs of these earlier protocols are generally linear in the number of multiplication gates in a circuit.

On the other hand, the input transformation technique introduced by our protocol allows verifiers to use transformed inputs to directly (one operation for each operation, like we do with clear text data) compute the circuit (important), and the verifier computed output is still bound to the challenge $x$. This is a first and brings us three direct benefits; 1) After "computing" the whole circuit using transformed inputs, the verifier can now validate sub-linear sized proof transcripts in sub-linear runtime. 2) Since the whole circuit is linearly/directly computed by the verifier, we can break a large circuit into several smaller subcircuits, minimizing the memory footprint to that of a sub-circuit. 3) Because the circuit is linearly "computed" by both the prover and the verifier, it gives the developer the power to significantly reduce the size of the circuit by combining "other" protocols in the middle and bypassing the "inactive" part of the circuit logic.

In our protocol, we begin by transforming each committed input parameter in $\mathbb{G}$ into its linear polynomial form in $\mathbb{Z}_{q}$. For simple, circuit $a_{1}{ }^{d}+a_{2}{ }^{d}+a_{3}{ }^{d}=$ $r$ takes inputs $a_{1}, a_{2}, a_{3}$ and outputs $r$. In our protocol, inputs $a_{1}, a_{2}, a_{3}$ and output $r$ are committed by the prover using Pedersen commitment. The prover then provides the transformed inputs $a_{1}, a_{2}, a_{3}$ in the linear polynomial form $a_{1}^{\prime}, a_{2}^{\prime}, a_{3}^{\prime} \in \mathbb{Z}_{q}$ s.t. $a_{i}^{\prime}=a_{i}+x \alpha_{i} \in \mathbb{Z}_{q}$ ( $\alpha_{i}$ is its blinding key and $x$ is the challenge generated during runtime). Since the transformed inputs are in $\mathbb{F}_{q}$, the verifier can plug these values directly into the circuit and just "execute" them to get the output $o$ e.g. $a_{1}^{\prime d}+a_{2}^{\prime d}+a_{3}^{\prime d}=o$. The circuit output $o \in \mathbb{Z}_{p}$ is the evaluation at point $x$ of a degree $d$ polynomial s.t. $f(x)=o$. The constant term of this polynomial is the circuit output $r$ and all other $d$ coefficients are its blinding keys. If the prover can prove 1) it knows all coefficients of the output polynomial before the evaluation point is given (e.g., using a polynomial commitment) 2) all input transformations are legit, then we say the proof is legit.

The output polynomial in the example above has a degree of $d$ because the transformed inputs (linear polynomials) are of degree 1. Taking to its $d$ th power will give a polynomial with a degree of $d$. So if the circuit is something like $a_{1}{ }^{3}+a_{2}{ }^{3}+a_{3}{ }^{3}+, \ldots,+a_{t}{ }^{3}=o$, the degree of the output polynomial is 3 regardless of the value of $t$. Throughout our paper, we use the symbol $p_{d}$ (short for "polynomial degree") to denote the degree of the final polynomial that leads to the circuit output. Precisely, it is actually one less than the degree of the output polynomial (e.g. if the degree of the output polynomial is 3 , then $p_{d}=2$ ). $p_{d}$ is different from "multiplication depth" or the 'total number of multiplications in a circuit"". For example, for a circuit $a_{1}{ }^{3} \cdot a_{2}{ }^{5}+a_{3}{ }^{6}=r, p_{d}=7\left(a_{1}{ }^{3} \cdot a_{2}{ }^{5}=\right.$ a polynomial of degree 8 ), which is bigger than the multiplication depth (5) but smaller than the total number of multiplications (12).

High performance Unlike other zero-knowledge protocols depends on polynomial commitment, the result of evaluation point $x$ is computed by the verifier (through direct computation of verified inputs in $\mathbb{Z}_{q}$ ) not sent by the prover, so it has to be accurate. In addition, the constant term and degree 1 term coefficients are also committed by the verifier. This allows us to bypass the expensive polynomial commitment evaluation protocols and only commits to $O(b)$ coefficients, where $b$ is a parameter set by the user that tells the protocol where to "reset" a degree $d$ polynomial back to a linear polynomial (degree 1 ), a technique used to slow the growth of polynomial degree. In our benchmarking, we set $b=\sqrt{p_{d}}$.

Specifically, when processing a high-depth circuit of $2^{20}$ sequential multiplication gates $\left(p_{d}=n\right)$ with 20 inputs on a single CPU thread, the performance of our protocol is: 0.7 seconds for the prover runtime cost; 8 milliseconds for the verifier runtime cost; and 39 kilobytes for the communication cost. To the best of our knowledge, our protocol offers the best prover/verifier runtime performance in the literature (transparent/non-interactive/high-depth protocols) by a large margin.

On the memory side, the theoretical memory cost of our protocol is $O(b)$. This makes our protocol extremely attractive because VOLE-based memoryefficient protocols generally require one round of interaction and are extremely expensive in terms of verifier runtime cost and communication cost.

Same Format for Circuit Inputs and Outputs Having both inputs and output(s) in the same format (Pedersen Commitment) allows the output(s) of one circuit be directly reused as inputs to other circuit. This is a really useful feature in practice when verifying data in a publicly accessible/verifiable data store (not limited to blockchain). e.g. allows many participants to continuously manage/update a datastore as long as they can prove these updates were correctly computed from existing data. While other zero-knowledge protocols may be able to support such feature in theory by mapping witnesses to some publicly accessible committed/encrypted data, it does not come naturally and will require additional cost that is not accounted for.

Zero-Knowledgeness Under our protocol, both inputs and outputs are in the same formats: Pedersen commitment in $\mathbb{G}$ (standard format) and linear polynomial in $\mathbb{F}_{q}$ (transformed format). Since data are perfectly hidden under both formats, results of direct computation are automatically perfectly hidden. Therefore, our protocol is zero-knowledge as long as we can ensure transcripts used for transformations between two formats are zero-knowledge.

We introduce our protocol in an interactive setting where all verifier challenges are random field elements. In practice, we assume the Fiat-Shamir heuristic is applied to our protocol to obtain a non-interactive zero-knowledge argument in the random oracle model.

## 3 Preliminaries

### 3.1 Assumption

Definition 1. (Discrete Logarithmic Relation) For all PPT adversaries $\mathcal{A}$ and for all $n \geq 2$ there exists a negligible function $\boldsymbol{n e g l}(\lambda)$ s.t.

$$
\operatorname{Pr}\left[\left.\begin{array}{c}
\mathbb{G}=\operatorname{Setup}\left(1^{\lambda}\right), g_{0}, . ., g_{n-1} \stackrel{\mathbb{S}}{\leftarrow} \mathbb{G} \mid c \\
a_{0}, \ldots, a_{n-1} \neq 0 \\
\leftarrow \mathcal{A}\left(\mathbb{G}, g_{0}, \ldots, g_{n-1}\right)
\end{array} \right\rvert\, \wedge \prod_{i=0}^{n-1} g_{i}^{a_{i}}=1\right] \leq \boldsymbol{n e g l}(\lambda)
$$

The Discrete Logarithmic Relation assumption states that an adversary can't find a non-trivial relation between the randomly chosen group elements $g_{0}, \ldots, g_{n-1} \in$ $\mathbb{G}^{n}$, and that $\prod_{i=0}^{n-1} g_{i}^{a_{i}}=1$ is a non-trivial discrete log relation among $g_{0}, \ldots, g_{n-1}$. Please note the generators we use in this paper are $g, h, u \in \mathbb{G}$.

### 3.2 Zero-Knowledge Argument of Knowledge

Interactive arguments are interactive proofs in which security holds only against computationally bounded provers. In an interactive argument of knowledge for a relation $\mathcal{R}$, a prover convinces a verifier that it knows a witness $w$ for a statement $x$ s.t. $(x, w) \in \mathcal{R}$ without revealing the witness itself to the verifier.

Let $(\mathcal{P}, \mathcal{V})$ denote a pair of PPT interactive algorithms, and Setup denotes a non-interactive setup algorithm that outputs public parameters $p p$ given a security parameter $\lambda$. Let $\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle$ denote the output of $\mathcal{V}$ on input $x$ after its interaction with $\mathcal{P}$, who has knowledge of witness $w$. The triple $(\operatorname{Setup}, \mathcal{P}, \mathcal{V})$ is called an argument for relation $\mathcal{R}$ if for all non-uniform PPT adversaries $\mathcal{A}$ it satisfies completeness, soundness, and zero-knowledge definitions defined below:

Definition 2. (Perfect Completeness) The triple (Setup, $\mathcal{P}, \mathcal{V})$ satisfies perfect completeness if for all PPT $\mathcal{A}$ :

$$
\operatorname{Pr}\left[\begin{array}{c|c}
(p p, x, w) \notin \mathcal{R} \text { or } & p p \leftarrow \boldsymbol{\operatorname { S e t u p }}\left(1^{\lambda}\right) \\
\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle=1 & (x, w) \leftarrow \mathcal{A}(p p)
\end{array}\right]=1
$$

The soundness notion we consider in this work is computational witness-extended emulation.

Definition 3. (Computational Witness-Extended Emulation or CWEE) Given a public-coin interactive argument tuple (Setup, $\mathcal{P}, \mathcal{V})$ and arbitrary prover algorithm $\mathcal{P}^{*}$, let Recorder ( $\mathcal{P}^{*}, p p, x, s$ ) denote the message transcript between $\mathcal{P}^{*}$ and $\mathcal{V}$ on shared input $x$, initial prover state $s$, and $p p$ generated by Setup. Furthermore, let $\mathcal{E} \operatorname{Recorder}\left(\mathcal{P}^{*}, p p, x, s\right)$ denote a machine $\mathcal{E}$ with a transcript oracle for this interaction that can rewind to any round and run again with fresh
verifier randomness. The tuple (Setup, $\mathcal{P}, \mathcal{V}$ ) has CWEE if for every deterministic polynomial time $\mathcal{P}^{*}$ there exists an expected polynomial time emulator $\mathcal{E}$ s.t. for all non-uniform polynomial time adversaries $\mathcal{A}$ the following holds:

$$
\left.\left.\begin{array}{c|c}
\mid \operatorname{Pr}[\mathcal{A}(\operatorname{tr})=1 & \begin{array}{c}
p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right) \\
(x, s) \leftarrow \mathcal{A}(p p)
\end{array} \\
\operatorname{tr} \leftarrow \operatorname{Recorder}\left(\mathcal{P}^{*}, p p, x, s\right)
\end{array}\right]-\quad \begin{array}{c}
\operatorname{Rp\leftarrow \operatorname {Setup}(1^{\lambda })} \\
\operatorname{Pr}\left[\left.\begin{array}{c}
\mathcal{A}(\operatorname{tr})=1 \wedge \\
\operatorname{tr} \operatorname{accepting} \\
\Longrightarrow(x, w) \in \mathcal{R}
\end{array} \right\rvert\,(\operatorname{tr}, w) \leftarrow \mathcal{E}^{\text {Recorder }\left(\mathcal{P}^{*}, p p, x, s\right)}(p p, x)\right.
\end{array}\right] \leq \boldsymbol{n e g l}(\lambda) .
$$

Informally, if an adversary can produce an argument that satisfies the verifier with some probability, then there exists an emulator producing an identically distributed argument with the same probability, as well as a witness. The zeroknowledge property requires that the verifier doesn't learn anything about the witness from its interaction with an honest prover.

Definition 4. (Perfect Special Honest Verifier Zero Knowledge for Interactive Arguments) An interactive proof is (Setup $\mathcal{P}, \mathcal{V})$ is a perfect special honest verifier zero knowledge (PHVZK) argument of knowledge for $\mathcal{R}$ if there exists a probabilistic polynomial time simulator $\mathcal{S}$ such that all interactive adversaries $\mathcal{A}$ have the following property for every $(x, w, \sigma) \leftarrow \mathcal{A}(p p) \wedge(p p, x, w) \in \mathcal{R}$, where $\sigma$ stands for verifier's public coin randomness for challenges

$$
\begin{array}{r}
\operatorname{Pr}\left[\begin{array}{c|c}
\mathcal{A}(\operatorname{tr})=1 & \left.\begin{array}{c}
p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right), \\
\operatorname{tr} \leftarrow\langle\mathcal{P}(p p, x, w), \mathcal{V}(p p, x)\rangle
\end{array}\right]= \\
\operatorname{Pr}\left[\begin{array}{c|c}
\mathcal{A}(\operatorname{tr})=1 & p p \leftarrow \operatorname{Setup}\left(1^{\lambda}\right), \\
\operatorname{tr} \leftarrow \mathcal{S}(p p, x, \sigma)
\end{array}\right]
\end{array} .\right.
\end{array}
$$

Above property states that the adversary chooses a distribution over statements $x$ and witnesses $w$ but is not able to distinguish between the simulated transcripts and the honestly generated transcripts for a valid statement/witnesses pair, and that the simulator has access to the randomness used by the verifier.

Definition 5. (Public Coin) All messages sent from $\mathcal{V}$ to $\mathcal{P}$ are chosen uniformly at random and independently of $\mathcal{P}$ 's messages.

### 3.3 Zero Knowledge Proof of Discrete Logarithm

For a prover to prove it has the knowledge of a discrete logarithmic $\kappa$ of some group element $s=g^{\kappa} \in \mathbb{G}$. We define the relation for this protocol as $\mathcal{R}_{P o D}=$ $\left\{(h, s ; \kappa): s=g^{\kappa}\right\}$. We also define two functions (ProveDL, Verify $D L$ ) for provers and verifiers to create and verify proof transcripts:

- Prove $\boldsymbol{D L}(g, \kappa) \rightarrow t r_{\kappa}$ generates the proof transcript $t r_{\kappa}$, where $\kappa$ is the witness.
- Verify $\boldsymbol{D L}\left(g, s, t r_{\kappa}\right) \rightarrow b \in\{0,1\}$ takes a proof transcript $t r_{\kappa}$ and a pair of group elements with discrete $\log$ relation $\left(g, s \in \mathbb{G} \wedge s=h^{\kappa}\right)$, and outputs true if the knowledge of the relation is verified, false otherwise.

In this paper, we assume the underlying implementation of the proof of discrete logarithm protocol is Schnorr's protocol [28]. We know for a fact that Schnorr's protocol has perfect completeness, special honest verifier zero knowledge, and computational witness-extended emulation.

### 3.4 Notations

Let $\mathbb{G}$ denote any type of secure cyclic group of prime order $p$, and let $\mathbb{Z}_{p}$ denote an integer field modulo $p$. Group elements other than generators are denoted by capital letters. e.g., $C=u_{1}^{a_{1}} u_{2}^{a_{2}} \ldots u_{n}^{a_{n}} \in \mathbb{G}$ is a commitment committed to a vector $\vec{a}$ denoted by a capital letter, and $B \in \mathbb{G}$ is a random group element also denoted by a capital letter. For generators used as base points to compute other group elements in our protocol, such as $\vec{g}, h \in \mathbb{G}$, we use lower case letters to denote them. Greek letters are used to label hidden key values. e.g., $v$ is the blinding key for Pedersen commitment $P$ on generator $h \in \mathbb{G}$ s.t. $P=g^{a} h^{v}$. Finally, we use standard vector notation $\vec{v}$ to denote vectors. i.e., $\vec{a} \in \mathbb{Z}_{p}^{n}$ is a list of $n$ values $a_{i}$ for $i=\{1,2, \ldots, n\}$ in $\mathbb{Z}_{p}$.

We write $\mathcal{R}=\{($ Public Inputs $;$ Witnesses $):$ Relation $\}$ to denote the relation $\mathcal{R}$ using the specified public inputs and witnesses.

## 4 Protocol for Arbitrary Circuits

We first define the relation for the base version of our protocol. For $l$ input parameters, let $\mathcal{C}_{\mathbb{F}}$ represent the set of arbitrary arithmetic circuits in $\mathbb{F}$, there exists a zero knowledge argument for the relation:

$$
\begin{align*}
& \left\{\left(g, h, u, \vec{P}, R \in \mathbb{G}, E \in \mathcal{C}_{\mathbb{F}} ; \vec{a}, \vec{v}, r, \phi \in \mathbb{Z}_{p}\right): E(\vec{a})=r\right.  \tag{1}\\
& \left.\wedge P_{i}=g^{a_{i}} h^{v_{i}} \wedge R=g^{r} h^{\phi}\right\}
\end{align*}
$$

$g, h, u$ are initial public parameters $p p$ generated during setup. The above relation states that each input parameter to a circuit is represented by a commitment $P_{i}$ in $\mathbb{G}$, which hides each input value $a_{i}$ with a blinding key $v_{i} . r$ is the output of circuit $E$ computed from inputs $\vec{a}$, which is also a committed value $R \in \mathbb{G}$ with blinding key $\phi$.

The main idea behind the "input transformation" concept is the process of transforming committed inputs in $\mathbb{G}$ to linear polynomials in $\mathbb{F}$, where the verifier can perform addition and multiplication operations "as is". For an input commitment $P_{i}$ s.t. $P=g^{a_{i}} h^{v_{i}} \in \mathbb{G}$ where $a_{i}$ is the input value and $v_{i}$ is its
blinding key, we create a corresponding value in the linear polynomial format $a_{i}{ }^{\prime} \in \mathbb{Z}_{q}$ :

$$
\begin{equation*}
a_{i}{ }^{\prime}=a_{i}+x \cdot \alpha_{i} \in \mathbb{Z}_{q} \tag{2}
\end{equation*}
$$

$x$ is the challenge provided by the verifier during runtime, and the blinding key of each input is replaced by a random $\alpha_{i}$ s.t. $\alpha_{i} \neq v_{i}$. Likewise, the circuit output commitment $R=g^{r} h^{\phi} \in \mathbb{G}$ also has a matching linear polynomial in $\mathbb{Z}_{q}$ with blinding key $\epsilon$.

$$
\begin{equation*}
r^{\prime}=r+x \cdot \epsilon \in \mathbb{Z}_{q} \tag{3}
\end{equation*}
$$

Where $r^{\prime}$ is "directly computed" from a list of $a_{i}^{\prime}(i=\{1, \ldots, l\})$ by the verifier. Since inputs represented by linear polynomials are in $\mathbb{Z}_{q}$, verifiers can perform arithmetic operations on them just as they do on decrypted numbers. The output value of a circuit evaluation is now a polynomial with $p_{d}+1$ degrees evaluated at point $x$. The constant term of the output polynomial is the circuit output $r$, and the coefficient of the degree one term is the blinding key $\epsilon$ of the circuit output.

In the next two sub-sections, we explain our protocol in two steps: In the first step, we introduce a sub-protocol (Protocol InputMapping) that allows the prover to prove each input in $\mathbb{G}$ is correctly transformed to that in $\mathbb{Z}_{q}$; In the second step, we introduce the full protocol (Protocol AriCircuit) that uses the aforementioned sub-protocol to validate transformed inputs and prove the circuit output is correctly computed from circuit inputs as relation 1 states.

Note that we use two prime group orders $p, q$ in our protocol. This is because all transformed inputs are in field $\mathbb{Z}_{q}$ and their commitments are in group $\mathbb{G}$ of order $p$. In the default setting, $q$ is a 61-bit number, which is significantly smaller than 253 -bit $p$ (curve 25519). Therefore, we must ensure operations in $\mathbb{G}$ do not make committed values (including blinding keys) overflow $p$, because that would distort the committed values once we convert them back to linear polynomials in $\mathbb{Z}_{q}$.

### 4.1 The Sub-Protocol for Linear Polynomial to Pedersen Commitment Mapping Validation

A sub-protocol that validates committed inputs in $\mathbb{G}$ is correctly mapped to transformed inputs in $\mathbb{Z}_{q}$. The relation we try to prove for this sub-protocol is:

$$
\begin{equation*}
\left\{\left(g, h, \vec{P} \in \mathbb{G}, \vec{a}^{\prime} \in \mathbb{Z}_{q} ; \vec{a}, \vec{\alpha} \in \mathbb{Z}_{q}, \vec{v} \in \mathbb{Z}_{p}\right): P_{i}=g^{a_{i}} h^{v_{i}} \wedge a_{i}^{\prime}=a_{i}+X \alpha_{i}\right\} \tag{4}
\end{equation*}
$$

$X$ is the challenge generated during runtime, so $\vec{a}^{\prime}$ is only available during runtime. The relation above says that for any commitment $P_{i}$ of value $a_{i}$ and blinding key $v_{i}$, the prover will provide $a_{i}^{\prime}$ during runtime that s.t. $a_{i}^{\prime}=a_{i}+x \alpha_{i}$ for some random public coin challenge $x$. Here we also limit $\vec{a} \in \mathbb{Z}_{q}$ instead of $\mathbb{Z}_{p}$ in relation 1 . This is justified if $q$ is large enough (e.g. over 32-bits).

An important requirement for input transformation is that we want to transform the hidden value in Pedersen commitment to a prime field that is friendly
to NTT. When multiplying two polynomials of degree $p_{d}$, the trivial approach to compute the output polynomial's coefficients would require a runtime cost of $O\left(p_{d}{ }^{2}\right)$, whereas the NTT based approach would reduce that to $O\left(p_{d} \log p_{d}\right)$. NTT requires a prime modulo $q$ s.t. $q=r \cdot 2^{k}+1$ to be the prime order of the group, where $r$ and $k$ are arbitrary constants, so we need to pick a prime $q$ that is NTT friendly. Also note that the prime $q$ is expected to be smaller than $p$ because the smaller the $q$ value in bits, the lower the communication cost.

Setup Phase Before the random challenge $x$ is available, the prover commits to two sets of group elements. The first element set are blinding elements used to hide data from other transcripts, and the second element set are commitments to blinding key $\vec{\alpha}$, the new blinding keys of $\vec{a}^{\prime}$ in $\mathbb{Z}_{q}$.

On the blinding element, the prover first randomly generates blinding keys $\vec{\omega} \in \mathbb{Z}_{p}^{l}$ and commits them using generator $u$ and the blinding key of input commitments $\vec{P}$.

$$
\begin{equation*}
S_{i}=g^{\omega_{i} \cdot q} u^{v_{i}} \in \mathbb{G} \quad i=\{1, \ldots, l\} \tag{5}
\end{equation*}
$$

Instead of committing the new blinding keys $\vec{\alpha}$, the prover commits to the difference between each $v_{i}$ and $\alpha_{i}$. This will prevent the dishonest prover from adjusting the value of $\vec{\alpha}$ once $x$ is known, and will also be used to link the blinding keys of each $a_{i}^{\prime}$ and $P_{i}$.

$$
\begin{equation*}
T_{i}=g^{v_{i}-\alpha_{i}} \in \mathbb{G} \quad i=\{1, \ldots, l\} \tag{6}
\end{equation*}
$$

The setup phase of the protocol is detailed below, it is called before the random challenge $x$ is generated.

$$
\begin{array}{lr}
\text { Input }:\left(; \vec{a}, \vec{\alpha}, \vec{\omega} \in \mathbb{Z}_{q}, \vec{v} \in \mathbb{Z}_{p}\right) & \\
\qquad \begin{array}{lr}
\mathcal{P}^{\prime} \text { s input }:(; \vec{a}, \vec{\alpha}, \vec{\omega}, \vec{v}) ; & \\
\mathcal{P} \text { compute }: & i=\{1, \ldots, l\} \\
S_{i}=g^{\omega_{i} \cdot q} u^{v_{i}} \in \mathbb{G} & i=\{1, \ldots, l\} \\
T_{i}=g^{v_{i}-\alpha_{i}} \in \mathbb{G} & \\
\mathcal{P} \rightarrow \mathcal{V}: \vec{S}, \vec{T} &
\end{array}
\end{array}
$$

Protocol InputMapping - Setup

Once the setup phase completes, the prover then sends $S_{i}, T_{i}$ for $i=\{1, . ., l\}$ to the verifier.

Validation Phase After the random challenge $x$ is generated, the prover computes $\vec{a}^{\prime}$ and sends them to the verifier. Next, the protocol checks the mapping between transformed inputs in $\mathbb{Z}_{q}$ to those in group $\mathbb{G}$.

The prover and then computes $e_{i}$ for each input $a_{i}$ s.t.:

$$
\begin{equation*}
e_{i}=\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) \cdot x+\omega_{i} \cdot q \in \mathbb{Z}_{p q} \quad i=\{1, \ldots, l\} \tag{7}
\end{equation*}
$$

We label $e_{i} \in \mathbb{Z}_{p q}$ because the upper bound of $e_{i}$ is $p \cdot q$. This is because the size of the first part of $e_{i}\left(\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) \cdot x\right)$ is very small (around $3|q|$ or $3 * 61=183$ bits) compared to the blinding element $w q(|p|+|q|>313$ bits $)$, so the chance of $e_{i}>p q$ is negligible, and the prover can just re-generate another $\omega_{i}$ if needed.
$e_{i}$ doesn't break the zero-knowledge requirement since it does not leak any information to the verifier. This is because the first part of $e_{i}:\left(\left(x \alpha_{i} \bmod q\right)-\right.$ $\left.x \alpha_{i}\right) \cdot x$ is a multiple of $q$, and is equivalent to $(s \cdot q) \cdot x$ for some $s \in \mathbb{Z}_{q}$. This implies $e_{i}=(s \cdot x+\omega) \cdot q$ for some randomly chosen $\omega$. Since $\omega \in \mathbb{Z}_{p}$ is significantly larger than $s \cdot x(|s \cdot x|=2|q|)$, it can perfectly hide $s \cdot x$. In our benchmarking, we use curve 25519 in our implementation, where $p$ is a $2^{253}$ bit number. For $|q|=2^{61}$ and $|s \cdot x|=2^{122}$, there is only a negligible chance $\left(\frac{q^{2}}{p} \approx 2^{-131}\right)$ that $\omega \in \mathbb{Z}_{p}$ does not perfectly hide $s$ (even if such case happens, the prover can just randomly choose another $\omega$ ).

The reason for creating $e_{i} \in \mathbb{Z}_{p q}$ is that we want the verifier to subtract out the blinding element $\left(x \alpha_{i} \bmod q\right)$ from $a_{i}^{\prime}\left(\right.$ e.g., $a_{i}^{\prime} \cdot x-e_{i}=\left(a_{i}+x \alpha_{i}\right) \cdot x-\omega_{i} q=$ $\left.x\left(a_{i}+x \alpha_{i}\right)-\omega_{i} q\right)$, assuming there is no overflow. The verifier can take out the remaining blinding element $\omega_{i} q$ and replace $x \alpha_{i}$ with Pedersen inputs' blinding key $x v_{i}$ using the previously committed values $S_{i}, T_{i}$.

$$
\begin{equation*}
g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}=\left(g^{x}\right)^{a_{i}+x v_{i}} u^{v_{i}} \in \mathbb{G} \quad i=\{1, \ldots, l\} \tag{8}
\end{equation*}
$$

With $\left(g^{x}\right)^{a_{i}+x v_{i}}$ available, the verifier can trivially divide each $P_{i}$ and take their sum with powers of $k$ to get $P K_{v_{t}}$.

$$
\begin{equation*}
P K_{v_{t}}=\prod_{i=1}^{l}\left(\frac{P_{i}^{x}}{g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}}\right)^{k^{i}} \in \mathbb{G} \tag{9}
\end{equation*}
$$

$P K_{v_{t}}=\left(h^{x} /\left(g^{x^{2}} u\right)\right)^{v_{t}}$. The verifier can confirm the correctness of the transformation if the prover can prove the knowledge of $v_{t}$ on generator $h^{x} /\left(g^{x^{2}} u\right) \in \mathbb{G}$.

Finally, the verifier needs to make sure $e_{i}$ doesn't alter the value of $a_{i}$. This can be done by taking the modulus $q$ of $e_{i}$ and checking if it returns 0 . This is trivial to understand since $a_{i}^{\prime}$ is in $\mathbb{Z}_{q}$. If $e_{i}$ is a multiple of $q$ then it is obvious that it cannot alter the value of $a_{i}$.

$$
\begin{equation*}
\text { if }\left(e_{i} \bmod q\right) \stackrel{?}{=} 0, \text { then continue } \tag{10}
\end{equation*}
$$

This test also implies the transformation process explained in this section is sound since the soundness of equation 9 is trivial to prove except for a negligible probability.

For example, if $a_{i}^{\prime *}=a_{i}^{*}+x \alpha_{i}=a_{i}+\delta+x \alpha_{i}$. Knowing that $e_{i}$ must be a multiple of $q$ for equation 10 to be true, we have $a_{i}^{*} \cdot x-e_{i}=\left(a_{i}+x \alpha_{i}\right) \cdot x-\omega_{i} q=$
$x\left(a_{i}+x \alpha_{i}\right)+x \delta$. In order for equality 8 to be true, the left side of the equality 8 must offset $x \delta$ using committed values $T_{i}$ and $S_{i}$, s.t. $x \delta$ will be removed after applying challenge $x$ to these elements (i.e., $T_{i}^{x^{2}} \cdot S_{i}$, note exponents are different powers of $x$ ), which only happens for a negligible chance of $1 / q$ when the dishonest prover successfully guessed $x$ correctly.

We have so far skipped the overflow problem. If $a_{i}+\left(x \alpha_{i} \bmod q\right)>q$, then we will have an overflow problem in equation 89 when computing $a_{i}^{\prime} \cdot x-e_{i}$. To get around this, the prover simply needs to check if $a_{i}+\left(x \alpha_{i}\right) \bmod q$ overflows $q$, and subtract $q \cdot x$ from $e_{i}$ if that's the case.

$$
\begin{equation*}
\text { if } a_{i}+\left(x \alpha_{i} \bmod q\right) \geq q, \text { then } e_{i}=e_{i}-q \cdot x \quad i=\{1, \ldots, l\} \tag{11}
\end{equation*}
$$

This adjustment does not break the zero-knowledgeness of $e_{i}$ since $x \in \mathbb{Z}_{q}$ is significantly smaller than $e_{i}$ 's blinding key $\omega_{i} \in \mathbb{Z}_{p}\left(\frac{q}{p}=2^{-192}\right)$, so subtracting $x \cdot q$ from $e_{i}$ 's blinding term $\omega_{i} \cdot q$ is perfectly unnoticeable except for a negligible chance of $2^{-192}$.

The validation part of the input-mapping sub-protocol is defined as follows:

$$
\begin{aligned}
& \text { Input }:\left(\vec{P}, \vec{S}, \vec{T} \in \mathbb{G}, \vec{a}^{\prime} \in \mathbb{Z}_{q} ; \vec{a}, \vec{\alpha}, \vec{\omega} \in \mathbb{Z}_{q}, \vec{v} \in \mathbb{Z}_{p}\right) \\
& \mathcal{P}^{\prime} \text { s input }:(\vec{P}, \vec{S}, \vec{T} ; \vec{a}, \vec{\alpha}, \vec{\omega}, \vec{v}) ; \mathcal{V}^{\prime} \text { s input }:(\vec{P}, \vec{S}, \vec{T}) \\
& \mathcal{P} \text { compute : } \\
& e_{i}=\left(\left(x \alpha_{i} \bmod q\right)-x \alpha_{i}\right) x+\omega_{i} q ; \quad i=\{1, \ldots, l\} \\
& \text { if } a_{i}+\left(x \alpha_{i} \bmod q\right)>q \text {, } \\
& \text { then } e_{i}=e_{i}-q \cdot x \quad i=\{1, \ldots, l\} \\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{e}, \vec{a}^{\prime} \\
& \mathcal{V} \rightarrow \mathcal{P}: k \stackrel{\$}{\leftarrow} \mathbb{Z}_{p} \\
& \mathcal{P} \text { compute: } \\
& v_{t}=\sum_{i=1}^{l} v_{i} k^{i} \in \mathbb{Z}_{p} \\
& t r_{v_{t}}=\operatorname{Prove} D L\left(\left(h^{x} /\left(g^{x} u\right)\right), v_{t}\right) \\
& \mathcal{P} \rightarrow \mathcal{V}: \operatorname{tr}_{v_{t}} \\
& \mathcal{V} \text { verify inputs : } \\
& \text { if }\left(e_{i} \bmod q\right) \stackrel{?}{=} 0, \text { then continue; } \quad i=\{1, \ldots, l\} \\
& \text { else reject } \\
& P K_{v_{t}}=\prod_{i=1}^{l}\left(\frac{P_{i}^{x}}{g^{a_{i}^{\prime} \cdot x-e_{i}} \cdot T_{i}^{x^{2}} \cdot S_{i}}\right)^{k^{i}} \in \mathbb{G} \\
& \text { if Verify } D L\left(\left(h^{x} /\left(g^{x^{2}} u\right)\right), P K_{v_{t}}, \operatorname{tr}_{v_{t}}\right) \text {, then accept } \\
& \text { else reject }
\end{aligned}
$$

Protocol for InputMapping - Verify

Theorem 1. (The Input-Mapping Protocol). The proof system presented in this section has perfect completeness, PHVZK, and CWEE.

Proof. The perfect completeness of protocol InputMapping Validation is trivial to observe.

To prove PHVZK for relation 4 , we define a simulator $\mathcal{S}_{\text {input }}$. To start, simulator $\mathcal{S}_{\text {input }}$ randomly generates $\vec{S}, \vec{T}$ and sends them to the verifier. After receiving challenge $x$ from the verifier, the simulator first generates a set of linear polynomials $\vec{a}^{\prime}$, and then simulates the proof transcripts proving the mapping between committed values $\vec{P}$ and $\vec{a}^{\prime}$ it generated.

The simulator $\mathcal{S}_{\text {input }}$ randomly generates and sends $\vec{e}$ according to the protocol specification ( $e_{i}$ is generated by first randomly generating a value $v_{i} \in \mathbb{Z}_{p}$ and multiplying it by $q$ s.t. $e_{i}=v_{i} \cdot q$ ) and sends them to the verifier. When challenge $k$ is received, the simulator $\mathcal{S}_{\text {input }}$ calls simulator $\mathcal{S}_{\text {dlog }}$ to generate transcript $t r_{v_{t}}$ and send it to the verifier.

The verifier then follows the protocol to compute $P K_{v_{t}}$ using transcripts $\vec{S}, \vec{T}, \vec{a}^{\prime}, \vec{e}$ and calls the Verify $D L$ function to test it against the input transcript $\operatorname{tr}_{v_{t}}$, We already know for a fact that there exists a simulator $\mathcal{S}_{d l o g}$ that can simulate witnesses for any discrete-log relation, and that simulators $\mathcal{S}_{\text {input }}$ and $\mathcal{S}_{\text {dlog }}$ choose all proof elements and challenges according to the randomness supplied by the adversary from their respective domains or compute them directly as described in the protocol. Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that protocol InputMapping is PHVZK.

To prove CWEE, we construct an extractor $\mathcal{X}$ that also uses extractor $\mathcal{X}_{\text {dlog }}$ to extract witnesses from proof of knowledge transcripts (which we know exist for a fact). To start, the extractor $\mathcal{X}$ interacts with the prover and receives $\vec{S}, \vec{T}$ from the prover. The extractor $\mathcal{X}$ then generates a challenge $x_{1}$ and forwards it to the prover. After receiving $\vec{e}_{1}, \vec{a}_{1}^{\prime}$, the extractor rewinds and repeats this step with another challenge $x_{2}$ to retrieve $\vec{e}_{2}, \vec{a}_{2}^{\prime}$.

After receiving transcripts $\vec{S}, \vec{T}$ and transformed inputs $\vec{a}^{\prime}$ from the prover, the extractor generates $k_{1}$ and then follows the protocol to get $t r_{v_{t 1}}$ (from the prover), $P K_{v_{t 1}}$. The extractor $\mathcal{X}$ then calls the extractor $\mathcal{X}_{d l o g}$ to retrieve $v_{t 1}$ from generator $h^{x} /\left(g^{x^{2}} u\right)$. The extractor then rewinds and repeats this step $l$ times to retrieve $v_{t 2}, \ldots, v_{t l+1}$. Through interpolation, the extractor retrieves witnesses $v_{i}$ for all $i$ in $\{1, \ldots, l\}$. Since we know for a fact that $e_{i}$ cannot alter $a_{i}$ and committed values $\vec{P}, \vec{S}, \vec{T}$ all applied to different powers of $x$, a cannot be altered except for a negligible probability or we find a non-trivial relationship between generators $g, h, u$.

Using any two different challenges $x_{i}, x_{i+1}$ we mentioned earlier, the extractor gets $\vec{a}_{1}^{\prime}$ and $\vec{a}_{2}^{\prime}$ from the prover, which we can trivially retrieve $\vec{\alpha}$ for all $i=$ $\{1, \ldots, l\}$ using the equation below.

$$
\begin{equation*}
a_{1_{i}}^{\prime}-a_{2_{i}}^{\prime}=\alpha_{i}\left(x_{1}-x_{2}\right) \tag{12}
\end{equation*}
$$

With $\vec{a}, \vec{\alpha}$ extracted, we can also extract $\omega$ from equation 7. Plugging witnesses $\vec{a}, \vec{\alpha}, \vec{v}, \vec{\omega}$ into generators $g, h, u$, we can re-write the left and right sides of equation 9 to:

$$
\begin{equation*}
\left(h^{x} /\left(g^{x^{2}} u\right)\right)^{\sum_{i}^{l} v_{i} \cdot k_{i}}=\prod_{i=1}^{l}\left(\frac{g^{a_{i} \cdot x} h^{v_{i} \cdot x}}{g^{a_{i}^{\prime} \cdot x-e_{i}+\left(v_{i}-\alpha_{i}\right) \cdot x^{2}+\omega_{i} \cdot q} \cdot u^{v_{i}}}\right)^{k^{i}} \in \mathbb{G} \tag{13}
\end{equation*}
$$

Take out challenge $k_{i}$, for each $i$ th input we have:

$$
\frac{h^{x \cdot v_{i}}}{g^{x^{2} v_{i}} u^{v_{i}}}=\frac{g^{a_{i} \cdot x} h^{v_{i} \cdot x}}{g^{a_{i}^{\prime} \cdot x-e_{i}+\left(v_{i}-\alpha_{i}\right) \cdot x^{2}+\omega_{i} \cdot q} \cdot u^{v_{i}}} \in \mathbb{G}
$$

This implies generator $g$ 's exponent in the nominator of the right-hand side element must cancel out, which automatically implies the $g$ 's exponent in the denominator of the right hand side element must be $a_{i} \cdot x+x^{2} v_{i}$. Since we know $e_{i}$ cannot alter $a_{i} \in \mathbb{Z}_{q}$ because $e_{i} \bmod q=0$ and $a_{i}^{\prime} \cdot x=a_{i} \cdot x+\alpha \cdot x^{2}$, we can trivially observe that no other value besides $a_{i}$ in $g$ 's exponent in the denominator $\left(\left(a_{i}^{\prime}\right) \cdot x-e_{i}+\left(v_{i}-\alpha_{i}\right) \cdot x^{2}+\left(\omega_{i} \cdot q\right)\right)$ is multiplied by the single power of $x$. This implies the equality above must be true for a computationally bounded prover except for a negligible probability of $\frac{1}{q}$ (adversary guessed $x$ correctly), or we find a non-trivial relationship between generators $g, h, u$, and this satisfies our CWEE definition.

### 4.2 The Complete Protocol

To prove the circuit output is correctly computed from transformed inputs $\vec{a}^{\prime}$, the prover needs to show it knows all coefficients of the output polynomial. For example, for a simple circuit that just outputs the sum of two inputs, the prover needs to show it knows the constant term $r$ and the coefficient of the degree 1 term $\epsilon$ of the output polynomial :

$$
\begin{equation*}
o=a_{1}^{\prime}+a_{2}^{\prime}=r+x \cdot \epsilon \tag{14}
\end{equation*}
$$

Computing the output polynomial of the addition operation is the same as adding two polynomials, where $r=\left(a_{1}+a_{2}\right)$ and the blinding key is $\epsilon=$ $\left(\alpha_{1}+\alpha_{2}\right)$. Likewise, multiplying two inputs $a_{1}{ }^{\prime}, a_{2}{ }^{\prime}$ is the same as multiplying two polynomials:

$$
\begin{equation*}
o=a_{1}{ }^{\prime} \cdot a_{2}{ }^{\prime}=r+x \cdot \epsilon+x^{2} \cdot \tau \tag{15}
\end{equation*}
$$

Where $r=a_{1} \cdot a_{2}, \epsilon=a_{2} \alpha_{1}+a_{1} \alpha_{2}$, and $\tau=\alpha_{1} \cdot \alpha_{2}$. We use the label " $o$ " to represent the circuit output, which is equivalent to the output polynomial evaluated at a point $X$. The degree of the polynomial will increase after each multiplication operation, so the efficiency will drop as the circuit polynomial degree $\left(p_{d}\right)$ increases.

To get the linear polynomial we need from the raw output $o$, the verifier needs to subtract out terms with degrees higher than one. In the multiplication circuit above, the verifier needs to eliminate the term of degree 2 to get the linear
polynomial. To do so, the prover commits to $\tau$ before the challenge $x$ is known. When the challenge $x$ is available, the prover sends the evaluation value $y$ to the verifier and proofs to prove $f(x)=x^{2} \tau=y$. The verifier can subtract $y$ from $o$ to get the output in linear polynomial form defined in equation 3 .

$$
\begin{equation*}
r^{\prime}=o-y \tag{16}
\end{equation*}
$$

This is a kind of 'bootstrapping'. We call value $y$ a "breaker". Breaker(s) subtracts all "noises" (polynomial terms of degree higher than one) from the raw output $o$.

In most arithmetic circuits with both addition/subtraction and multiplication/division operations, the polynomial degree is likely smaller than the number of multiplications. This is because every time we add two output wires, their polynomials get merged (e.g. $a_{1}^{7}+a_{2} \cdot a_{3}^{6}+a_{4}^{7}$ only outputs a polynomial of degree 8 , but 21 multiplications are performed). However, there are cases where the $p_{d}$ value of a circuit exceeds the total number of multiplications of the circuit and therefore significantly degrade the performance of the protocol.

What we need is 1 ) a mechanism to allow the prover to repeatedly reset the polynomial degree back to 1 so that the penalty of high degree polynomials can be avoided, 2) a mechanism to efficiently commit to $p_{d}$ coefficients.

Breaking a large circuit into $\boldsymbol{p}_{\boldsymbol{d}}$ smaller circuits We can break the circuit we are evaluating into $p_{d}$ sub-circuits and then batch evaluate $p_{d}$ circuits at once. So when the polynomial degree of a sub-circuit $i$ reaches degree $b+1$, we can reset it to a linear polynomial of degree 1 by evaluating the sub-circuit and then use its output as an input to the next sub-circuit (Figure 1).

$$
\begin{gathered}
o_{1} \\
o_{2} \\
\cdot \\
\cdot \\
\cdot \\
o_{p_{d}}
\end{gathered}=\left(\begin{array}{ccccccc}
r_{1} & \epsilon_{1} & \tau_{1,1} & \tau_{1,2} & \cdot & \cdot & \tau_{1, b} \\
r_{2} & \epsilon_{2} & \tau_{2,1} & \tau_{2,2} & \cdot & \cdot & \tau_{2, b} \\
\cdot & \cdot & \cdot & \cdot & & \cdot & \\
\cdot & \cdot & \cdot & & \cdot & \cdot & \\
r_{p_{d}} & \epsilon_{p_{d}} x & \tau_{p_{d}, 1} & \tau_{p_{d}, 2} & \cdot & \cdot & \tau_{p_{d}, b}
\end{array}\right)\left(\begin{array}{c}
1 \\
x \\
\cdot \\
\cdot \\
x^{b+1}
\end{array}\right)
$$

Figure 1
We evaluate each sub-circuit $i=\left\{1, \ldots, p_{d}\right\}$ in the same way we evaluate the full circuit. The output of a sub-circuit is also in the linear polynomial form $r_{i}^{\prime}=r_{i}+x \cdot \epsilon_{i}$ like that of the full circuit 3 . Since the output of each sub-circuit is in the same linear polynomial format as inputs to the circuit, they can be reused as inputs to subsequent sub-circuits.

Sub-circuits are arranged according to the computation order of the full circuit. If outputs of sub-circuits are correct, then the output of the final sub-circuit must also be the output of the full circuit. Under this setup, the output of each sub-circuit $r_{i}^{\prime}$ for $i=\left\{1, \ldots, p_{d}\right\}$ is computed by subtracting the breaker $y_{i}$ of the sub-circuit from its raw output $o_{i}$.

$$
\begin{equation*}
r_{i}^{\prime}=o_{i}-y_{i} \quad \text { for } \quad i=\left\{1, \ldots, p_{d}\right\} \tag{17}
\end{equation*}
$$

Each $o_{i}$ is the output of each sub-circuit at evaluation point $x$, and each breaker $y_{i}$ is the evaluation output of that sub-polynomial minus the constant term ( $r$, sub-circuit output) and the degree 1 term ( $\epsilon$, the blinding key of they sub-circuit output), see Figure 2.

$$
\begin{gathered}
o_{1} \\
o_{2} \\
\cdot \\
\cdot \\
o_{p_{d}}
\end{gathered}=\underbrace{\left(\begin{array}{cc}
r_{1} & \epsilon_{1} \\
r_{2} & \epsilon_{2} \\
\cdot & \cdot \\
\cdot & \cdot \\
r_{p_{d}} & \epsilon_{p_{d}}
\end{array}\right)\binom{1}{x}}_{\text {outputs }}+\underbrace{\left(\begin{array}{ccccc}
\tau_{1,1} & \tau_{1,2} & \cdot & \cdot & \tau_{1, b} \\
\tau_{2,1} & \tau_{2,2} & \cdot & \cdot & \tau_{2, b} \\
\cdot & \cdot & \cdot & & \cdot \\
\cdot & \cdot & & \cdot & \cdot \\
\tau_{p_{d}, 1} & \tau_{p_{d}, 2} & \cdot & \cdot & \tau_{p_{d}, b}
\end{array}\right)\left(\begin{array}{c}
x^{2} \\
x^{3} \\
\cdot \\
\cdot \\
x^{b+1}
\end{array}\right)}_{\text {breakers }}
$$

Figure 2
By proving the knowledge of all coefficients of each sub-circuit (e.g., the prover and the verifier engage to evaluate the sub-polynomial defined in each row of Figure 2), we know each sub-circuit output $r_{i}^{\prime}$ is correct.

It is worth noticing that the prover can set breakers anywhere on the circuit at any time depending on the application. For simplicity, we assume all breaker are set at $b+1$ degrees, where $b=p_{d} / p_{d}$.

Make group exponentiation operations sub-linear to the $\boldsymbol{p}_{\boldsymbol{d}}$ value Using polynomial evaluation protocol to evaluate each sub-circuit would be expensive. In our case, the verifiers already have the evaluation result $o_{i}$ of each polynomial (sub-circuit) available using direct computation of inputs, and the coefficients for both constant and degree 1 terms are committed values (output of the subcircuit). We only need to make sure the prover cannot commit to some altered circuit output (e.g. $r_{i}^{*}=r_{i}-\delta_{i}$ for some arbitrary $\delta_{i}$ ) while the altered breaker ( $y_{i}^{\prime *}=y_{i}^{\prime}+\delta_{i}$ s.t. $r_{i}^{*}=o_{i}-y^{\prime *}$ ) can still be computed from "other" coefficients (degree 2 to degree $b$ coefficients) of the sub-circuit.

We use two challenges $x, w$ to allow the prover to "commit" to "other" coefficients of $1, \ldots, p_{d}$ polynomials in batches without using expensive ECC operations.

First, we commit to the outputs of sub-circuits. Instead of using Pedersen commitments to commit on each output as we do with the circuit output $R$, we use two vector commitments to batch commit the outputs of sub-circuits. Since the ECC group we use has an order $p$, we need to define $r_{p_{i}}=r_{i}+x \cdot \epsilon_{i} \in \mathbb{Z}_{p}$ s.t. $r_{p_{i}} \bmod q=r_{i}^{\prime}$. The prover will send this value to the verifier.

Since $r_{p_{i}}$ is a raw number with approximately $2|q|$ bits, anyone can easily extract witnesses $r_{i}, \epsilon_{i}$ with $x$. This is why we need a blinding key $\mu_{i}$ that multiplies the order of the smaller group $q$ to create a blinding value. In our benchmarking test, we set $\mu_{i} 60$-bits longer than $q$, so it will perfectly hide $r_{p_{i}}$ except for a negligible chance of $2^{-60}$ (even in such a case, only insignificant information may be leaked, in which case the prover can randomly select another $\mu_{i}$ ).

$$
\begin{equation*}
\mu_{i} \stackrel{\$}{\leftarrow} \mathbb{Z}_{m} \quad i=\{1, \ldots, l\} \tag{18}
\end{equation*}
$$

$$
\begin{array}{rr}
r_{p_{i}}^{\prime}=r_{i}+x \cdot \epsilon_{i}+\mu_{i} \cdot q \in \mathbb{Z}_{p} & i=\left\{1, \ldots, p_{d}\right\} \\
R_{t}=\prod_{i=1}^{p_{d}} u_{i}^{r_{i}+\mu_{i} \cdot q} \in \mathbb{G}, \quad E_{t}=\prod_{i=1}^{p_{d}} u_{i}^{\epsilon_{i}} \in \mathbb{G} \tag{20}
\end{array}
$$

Once $\bmod q$ is applied to $r_{p_{i}}^{\prime}$, this blinding value will disappear in $r_{i}^{\prime}$.

$$
\begin{equation*}
r_{i}^{\prime}=r_{p i} \bmod q \in \mathbb{Z}_{q} \quad i=\left\{1, \ldots, p_{d}\right\} \tag{21}
\end{equation*}
$$

After challenge $x$ is known, the prover sends $\overrightarrow{r_{p}}{ }^{\prime}$ to the verifier. The verifier uses the following equality to check if the set ${\overrightarrow{r_{p}}}^{\prime}$ matches the committed value and then converts them to $\vec{r}^{\prime}$.

$$
\begin{equation*}
\prod_{i=1}^{p_{d}} g_{i}^{r_{i}}=\prod_{i=1}^{p_{d}} R_{i} \cdot E_{t}^{x} \tag{22}
\end{equation*}
$$

The result $o_{i}$ of each polynomial evaluation at point $x$ is computed by the verifier directly using inputs and circuit logic. Since $r_{i}^{\prime}$ is a committed value, this implies each breaker $y_{i}^{\prime}$ is also a committed value because it is computed directly from the evaluation result $o_{i}$ and the committed output $r_{i}^{\prime}$.

$$
\begin{equation*}
y_{i}^{\prime}=o_{i}-r_{i}^{\prime} \quad \text { for } \quad i=\left\{1, \ldots, p_{d}\right\} \tag{23}
\end{equation*}
$$

We can observe that if the prover makes the coefficients $\tau_{i, 1}, \ldots, \tau_{i, b}$ "committed" before challenge $x$ is known, then we know $y_{i}^{\prime}=\tau_{i, 1} x^{2}+, \ldots,+\tau_{i, b} x^{b+1}$ except with negligible probability at most $\frac{q}{b}$ (because of possible existence of polynomial roots). If the breaker is correctly computed, the output $r_{i}^{\prime}$ must also be correctly computed.

We use a challenge $w$ that's made available after the prover commits to $R_{t}, E_{t}$ and before the challenge $x$ is known to allow the prover to hide and pass all coefficients for each degree in one single element $z_{j}$.

$$
\begin{equation*}
z_{j}=\sum_{i=1}^{p_{d}}\left(\tau_{i, j} \cdot w^{i}\right) \in \mathbb{Z}_{q} \quad j=\{1, \ldots, b\} \tag{24}
\end{equation*}
$$

The verifier also verifies $\vec{y}^{\prime}$ as a single element. The update matrix that the verifier now needs to verify is depicted in Figure 3.

$$
\begin{gathered}
y_{1}^{\prime} w \\
y_{2}^{\prime} w^{2} \\
\cdot \\
\cdot \\
y_{p_{d}}^{\prime} w^{p_{d}}
\end{gathered}=\left(\begin{array}{ccccc}
\tau_{1,1} w & \tau_{1,2} w & \cdot & \cdot & \tau_{1, b} w \\
\tau_{2,1} w^{2} & \tau_{2,2} w^{2} & \cdot & \cdot & \tau_{2, b} w^{2} \\
\cdot & \cdot & \cdot & & \cdot \\
\cdot & \cdot & & \cdot & \cdot \\
\tau_{p_{d}, 1} w^{p_{d}} & \tau_{p_{d}, 2} w^{p_{d}} & \cdot & \cdot & \tau_{p_{d}, b} w^{p_{d}}
\end{array}\right)\left(\begin{array}{c}
x^{2} \\
x^{3} \\
\cdot \\
\cdot \\
x^{b+1}
\end{array}\right)
$$

Figure 3
Challenge $x$ is only made available after $\vec{z}$ are known. The verifier validates $\vec{y}^{\prime}$
by multiplying each $y_{i}^{\prime}$ by $w^{i}$ and each $z_{j}$ by $x^{j+2}$, their difference must be equal to $0 \in \mathbb{Z}_{q}$.

$$
\begin{equation*}
\sum_{j=1}^{b} z_{j} \cdot x^{j+1}-\sum_{i}^{p_{d}} y_{i} \cdot w^{i} \bmod q=0 \tag{25}
\end{equation*}
$$

To show why this is sound for all sub-circuit outputs. Let's say $r_{i}^{*}=r_{i}-\delta_{i}$ for some arbitrary $\delta_{i}$ (some of it may be 0 ), the dishonest prover needs to find a set $\vec{\Delta}$ before challenge $x$ is known s.t.

$$
\begin{equation*}
\sum_{j=1}^{b}\left(z_{j}+\Delta_{j}\right) \cdot x^{j+1}=\sum_{i=1}^{p_{d}}\left(y_{i}+\delta_{i}\right) \cdot w^{i} \in \mathbb{Z}_{q} \tag{26}
\end{equation*}
$$

The equality above can be rewritten as the equality below, which clearly shows such $\vec{\Delta}$ cannot be found without prior knowledge of challenge $x$.

$$
\begin{equation*}
\sum_{j=1}^{b} \Delta_{j} \cdot x^{j+1}=\sum_{i=1}^{p_{d}} \delta_{i} \cdot w^{i} \in \mathbb{Z}_{q} \tag{27}
\end{equation*}
$$

We now have all the necessary pieces to describe our main protocol for arithmetic circuits.

### 4.3 The Complete Protocol For Arithmetic Circuits

We define two more functions for our protocol definition. function
computeSubCircuitKeys is used by the prover to compute the keys of each sub-circuit or "row" in Figure 1, and function
computeSubCircuit is used by the verifier to compute the value of a subcircuit at the evaluation point $x$ :

1. function computeSubCircuitKeys ${ }_{i}$ ("input values", "input keys"): for $i=$ $\left\{1, . ., p_{d}\right\}$, it takes input values $\vec{a}$ and keys $\vec{\alpha}$ to evaluate the sub-circuit and outputs $r_{i}, \epsilon_{i}, \vec{\tau}_{i}$ (coefficients of $o_{i}$ ).
2. function computeSubCircuit ${ }_{i}$ ("inputs in linear polynomial form", "output from the previous computeSubCircuit function"): for $i=\left\{1, . ., p_{d}\right\}$, it trivially computes the result $o_{i}$ from the inputs to the sub-circuit.

For example, if the logic of the $i$ th sub-circuit is to return the product of $l$ inputs, then the computeSubCircuit ${ }_{i}$ function simply performs $o_{i}=a_{1} \times$ $a_{2} \times, \ldots, \times a_{l}$. Since $a_{1}, \ldots, a_{l}$ are linear polynomials evaluated at point $\mathrm{X}, o_{i}$ is the evaluation of the output polynomial at point X , and the computeSubCircuitKeys ${ }_{i}$ function computes all coefficients of the output polynomial. We are now ready to introduce the complete version of our protocol - Protocol AriCircuit - as follows:

$$
\begin{align*}
& \text { Input }:\left(\vec{P}, R \in \mathbb{G} ; \vec{a}, \vec{v}, r, \phi \in \mathbb{Z}_{p}\right)  \tag{28}\\
& \quad \mathcal{P}^{\prime} \text { s input }:(\vec{P}, R ; \vec{a}, \vec{v}, r, \phi) ; \mathcal{V}^{\prime} \text { s input }:(\vec{P}, R)  \tag{29}\\
& \quad \mathcal{P} \text { compute }: \tag{30}
\end{align*}
$$

$$
\begin{align*}
& \mu_{i} \stackrel{\S}{\leftarrow} \mathbb{Z}_{m},  \tag{31}\\
& i=\left\{1, \ldots, p_{d}\right\} \\
& \alpha_{i} \stackrel{\&}{\leftarrow} \mathbb{Z}_{q} \text {, }  \tag{32}\\
& i=\{1, \ldots, l\} \\
& \text { for } i=1, \ldots, p_{d} \quad\{  \tag{33}\\
& r_{i}, \epsilon_{i}, \overrightarrow{\vec{r}}_{i}=\text { computeSubCircuitKeys }_{i}\left(\vec{a}^{\prime}, \vec{\alpha}^{\prime}, r_{i}, \epsilon_{i}, \vec{\tau}_{i}\right) ;  \tag{34}\\
& \left.\epsilon_{i}=\left(\epsilon_{i}-\mu_{i}\right) \bmod p \quad\right\} / / \text { note } r_{p_{d}}=r  \tag{35}\\
& R_{t}=\prod_{i=1}^{p_{d}} u_{i}^{r_{i}+\mu_{i} \cdot q} \in \mathbb{G}, \quad E_{t}=\prod_{i=1}^{p_{d}} u_{i}^{\epsilon_{i}} \in \mathbb{G}  \tag{36}\\
& \text { InputMapping-Setup }\left(; \vec{a}\left\|r_{p_{d}}, \vec{\alpha}\right\| \epsilon, \vec{v} \| \phi\right) \rightarrow: \vec{S}, \vec{T}  \tag{37}\\
& \mathcal{P} \rightarrow \mathcal{V}: R_{t}, E_{t}, \vec{S}, \vec{T}  \tag{38}\\
& \mathcal{V} \rightarrow \mathcal{P}: \quad w \stackrel{\S}{\leftarrow} \mathbb{Z}_{p}  \tag{39}\\
& \mathcal{P} \text { compute : }  \tag{40}\\
& z_{j}=\sum_{i=1}^{p_{d}}\left(\tau_{i, j} \cdot w^{i}\right) \in \mathbb{Z}_{q} \quad j=\{1, \ldots, b\}  \tag{41}\\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{z}  \tag{42}\\
& \mathcal{V} \rightarrow \mathcal{P}: \quad x \stackrel{\oiint}{\leftarrow} \mathbb{Z}_{p}  \tag{43}\\
& \mathcal{P} \text { compute : }  \tag{44}\\
& a_{i}^{\prime}=a_{i}+x \cdot \alpha_{i} \in \mathbb{Z}_{q} \quad i=\{1, \ldots, l\}  \tag{45}\\
& r_{p_{i}}^{\prime}=r_{i}+\mu_{i} \cdot q+x \cdot \epsilon_{i} \in \mathbb{Z}_{p} \quad i=\{1, \ldots, l\}  \tag{46}\\
& \mathcal{P} \rightarrow \mathcal{V}: \vec{a}^{\prime}, \vec{r}_{p}^{\prime}  \tag{47}\\
& \mathcal{V} \text { verify final output: }  \tag{48}\\
& r_{i}^{\prime}=r_{p_{i}}^{\prime} \bmod q \in \mathbb{Z}_{q} \quad i=\left\{1, \ldots, p_{d}\right\}  \tag{49}\\
& \text { for } i=1, \ldots, p_{d} \quad\{  \tag{50}\\
& o_{i}=\text { computeSubCircuit }_{i}\left(\vec{a}^{\prime}| | \vec{r}^{\prime}\right) \in \mathbb{Z}_{p}  \tag{51}\\
& \left.y_{i}^{\prime}=o_{i}-r_{i}^{\prime} \in \mathbb{Z}_{q}\right\}  \tag{52}\\
& \text { if }\left(\prod_{i=1}^{p_{d}} g_{i}^{r_{i}} \stackrel{?}{=} \prod_{i=1}^{p_{d}} R_{i} \cdot E_{t}^{x}\right)  \tag{53}\\
& \wedge\left(\left(\sum_{j=1}^{b} z_{j} \cdot x^{j+1}-\sum_{i}^{p_{d}} y_{i} \cdot w^{i}\right) \bmod q=0\right) \text { then continue }  \tag{54}\\
& \text { else reject }  \tag{55}\\
& \text { if InputMapping-Verify }\left(\vec{P}\left\|R, \vec{S}, \vec{T}, \vec{a}^{\prime}\right\| r_{p_{d}}^{\prime} ; \vec{a}\|r, \vec{\alpha}\| \epsilon, \vec{v} \| \phi\right)  \tag{56}\\
& \text { then continue }  \tag{57}\\
& \text { else reject } \tag{58}
\end{align*}
$$

Protocol AriCircuit

Note that a $b$ degree polynomial can have at most $b$ roots, and therefore there is a negligible probability of $b / p$ that an attacker can play with coefficients to pass the validation test of an altered output.

Theorem 2. (Protocol AriCircuit). The proof system presented in this section has perfect completeness, PHVZK, and CWEE.

Proof. The perfect completeness of protocol AriCircuit is trivial to see.
To prove PHVZK for relation 1, we define a simulator $\mathcal{S}$. Simulator $\mathcal{S}$ calls on simulators $\mathcal{S}_{\text {input }}$ defined earlier to generate transcripts and simulate interactions in the InputMapping sub-protocol used in our protocol.

We have already shown $\mathcal{S}_{\text {input }}$ can simulate all interactions needed in subprotocol InputMapping. We now show how $\mathcal{S}$ generates the rest of the transcripts according to the randomness supplied by the adversary from their respective domains or computes them directly as described in the protocol.

Simulator $\mathcal{S}$ randomly generates random input witnesses $\vec{a}^{*}, \vec{\alpha}^{*}$ and computes circuit output witnesses $\vec{r}^{*}, \vec{\epsilon}^{*}, \vec{\tau}^{*}$ and creates commitments $R_{t}, E_{t}$ accordingly to the protocol specification and sends them to the verifier. The simulator then follows the protocol to create $\vec{z}$ with challenge $w$ and sends them to the verifier. After receiving challenge $x$ from the verifier, the simulator computes $\vec{a}^{\prime *}, r_{p_{d}}{ }^{\prime *}$ according to the protocol specification. Note that after randomly generating input witnesses $\vec{a}^{*}, \vec{\alpha}^{*}$, the simulator $\mathcal{S}$ just followed the protocol specification to create all other transcripts needed. The rewinding only takes place in the simulator $\mathcal{S}_{\text {input }}$. This implies that if the input-mapping process is PHVZK, then the whole protocol is PHVZK.

Since all elements in proof transcripts are either independently randomly distributed or their relationship is fully defined by the verification equations, we can conclude that the protocol AriCircuit is PHVZK.

To prove CWEE, we define extractor $\mathcal{X}$ that calls on extractors $\mathcal{X}_{\text {input }}$ defined earlier to extract witnesses for the two sub-protocols used in the protocol AriCircuit.

We already know $\mathcal{X}$ can extract $\vec{a}, \vec{\alpha}, \vec{v}$ and $\vec{r}, \vec{\epsilon}$ using extractor $\mathcal{X}_{\text {input }}$ from committed transcripts $\vec{S}, \vec{T}$. Using input witnesses, we can use the function computeSubCircuitKeys to compute circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$. Next, we show how these witnesses must match ones extracted from circuit transcripts $\overrightarrow{r_{p}}{ }^{\prime}, \vec{z}, R_{t}, E_{t}$.

First, the extractor $\mathcal{X}$ extracts the same circuit witnesses from transcripts $\vec{z}$ that still satisfy the relation 25 with transcript $\vec{y}^{\prime}$. The extractor $\mathcal{X}$ generates $p_{d}+1$ challenges $\vec{w}$ and forwards them to the prover. After receiving circuit evaluation transcripts $\vec{\tau}_{1}$ computed from the first challenge $w_{1}$, the extractor rewinds and repeats this step $p_{d}$ times to generate challenges $w_{2}, \ldots, w_{p_{d}+1}$ and retrieve witnesses $\vec{\tau}_{2}, \ldots, \vec{\tau}_{p_{d}+1}$ from interpolation. Rearrange equality 25 and substitute $\vec{z}$ for $\vec{\tau}$ as specified in 24 we get the following equality:

$$
\begin{equation*}
\sum_{j=1}^{b}\left(\sum_{i=1}^{p_{d}} \tau_{i, j} w^{i}\right) \cdot x^{j+1}=\sum_{i}^{p_{d}} y_{i}^{\prime} \cdot w^{i} \in \mathbb{Z}_{q} \tag{59}
\end{equation*}
$$

Second, the extractor $\mathcal{X}$ extracts $\vec{r}, \vec{\epsilon}$ from transcripts ${\overrightarrow{r_{p}}}^{\prime}$ that satisfy the condition held in the committed values in $R_{t}, E_{t}$. The extractor $\mathcal{X}$ can use a pair of challenges $x_{1}, x_{2}$ to trivially extract $\vec{r}, \vec{\epsilon}$ from ${\overrightarrow{r_{p}}}^{\prime}$, and they trivially satisfy the condition held by committed values in $R_{t}, E_{t}$ or else the equality test TBD will not stand. With $\vec{r}, \vec{\epsilon}$ extracted, the equality 59 is updated to following:

$$
\begin{equation*}
\sum_{j=1}^{b}\left(\sum_{i=1}^{p_{d}} \tau_{i, j} w^{i}\right) \cdot x^{j+1}=\sum_{i}^{p_{d}}\left(o_{i}-\left(r_{i}+x \cdot \epsilon_{i}\right)\right) \cdot w^{i} \in \mathbb{Z}_{q} \tag{60}
\end{equation*}
$$

Knowing that $\vec{o}$ must be true since it was directly computed by the verifier and that $\vec{r}, \vec{\epsilon}$ match the committed values, the coefficients $\vec{\tau}$ extracted must satisfy equation 24 for equality above to be true for any pairs of challenges $w, x$ except for a negligible probability.

Finally, we check if the extracted circuit witnesses $\vec{r}, \vec{\epsilon}, \vec{\tau}$ extracted from circuit transcripts match those computed from input witnesses $\vec{a}, \vec{\alpha}$ using computeSubCircuitKeys functions. Since the computed from computeSubCircuitKeys functions also need to satisfy equality 60 for the same evaluation result $\vec{o}$ and match the commitments for $\vec{r}, \vec{\epsilon}$ given any pairs of challenge $w, x$. The witnesses computed from $\vec{a}, \vec{\alpha}$ must match these extracted from circuit transcripts except for a negligible probability, or else we find a non-trivial discrete log relationship between generators $g, h$ (for input witnesses).

### 4.4 Customized Sub-Circuit and The Use of Range Proof for Comparison Operations

One of the primary reasons for using a boolean circuit over an arithmetic circuit in the real world (there are no real-world applications of trying to prove a hash) is the ability to perform comparison operations ( $>,<, \geq, \leq,=$ ). Our design allows the use of customized circuit(s) to embed range-proof protocols to evaluate comparison operations inside the arithmetic circuit being processed. This way, there will be fewer needs for expensive boolean circuits and/or the expensive process of decomposing/recomposing integers to bits within a circuit.

The idea is similar to that of "custom gates" found in SNARKs protocols in principle but very different in implementation. In general, the design goal of a customized circuit is to utilize existing algorithms/protocols to handle operations that would otherwise be expensive in our protocol (or any other protocol).

In particular, we can use range proof inside an arithmetic circuit to handle all comparison operations and decimal point reductions. This is huge in practice because either using a boolean circuit directly or converting to/from a boolean circuit inside an arithmetic circuit is expensive.

For example, to prove $a_{1}>a_{2}$ ( or $P_{1}>P_{2}$ ), the prover can do the following:

1. Commit to their difference $C=g^{c} h^{v}$ s.t. $c=a_{1}-a_{2}$ (or compute $C$ from $P_{1}, P_{2}$ using additive homomorphism e.g. $C=P_{1} / P_{2}$ ).
2. Call protocol InputMapping to check $c^{\prime}=a_{1}^{\prime}-a_{2}^{\prime} \in \mathbb{Z}_{q} \operatorname{maps} C \in \mathbb{G}$;
3. Use a range-proof protocol to prove $C>0$. If returns true, then we know $a_{1}>a_{2}$.

An example usage is as follows:

$$
\begin{aligned}
& \text { computeSubCircuit : }\left(\vec{a}^{\prime} \in \mathbb{Z}^{q}, \vec{C} \in \mathbb{G}\right) \\
& c_{1}^{\prime}=a_{1}^{\prime}-a_{2}^{\prime} \in \mathbb{Z}_{q} \\
& \text { if Protocol RangeProof }\left(C_{1}, 0\right) \\
& c_{2}^{\prime}=a_{3}^{\prime}-a_{4}^{\prime} \in \mathbb{Z}_{q} \\
& \text { if Protocol RangeProof }\left(C_{2}, 0\right) \\
& \quad o=\text { do something } \\
& \text { else } \\
& \quad o=\text { do something else } \\
& \text { else } \\
& \quad o=\text { do something } \\
& \text { if Protocol InputMapping }\left(C_{1}\left\|C_{2}, c_{1}^{\prime}\right\| c_{2}^{\prime}\right) \\
& \text { else reject }
\end{aligned}
$$

Function computeSubCircuit (Customized)
In the computeSubCircuit function defined above, the circuit first tests if $a_{1}^{\prime}>a_{2}^{\prime}$ and then tests if $a_{3}^{\prime}>a_{4}^{\prime}$. Before the function returns $o$, it batch checks the mapping between each $c_{i}^{\prime}$ and $C_{i}$ pair. In practice, all calls to range proof should also be batch verified at the end of the function.

We can bypass the "inactive" part of the circuit (similar to that of suBlonk (ePrint)). For example, if the first range proof returns false (e.g., $a_{1}<a_{2}$ ), then both the prover and the verifier can bypass the two "else" parts of the circuit above. However, it is worth noticing that using a customized sub-circuit bypassing parts of the circuit may leak information about data to attackers, so one must use such a strategy with extreme caution.

We believe combining the arithmetic circuit and range-proof protocols is the most efficient way to run a zero-knowledge test in the real world. This is much more powerful than it seems. Besides handling comparison operators, one can also use embedded range proof to verify floating point operations (perform multiplication/division operations as full integer operations, then remove decimal places by proving their range). We believe it will allow us to use arithmetic circuits to handle almost all types of business logic that would otherwise require boolean circuits.

It is also not necessary that all breakers $y_{1}, \ldots, y_{m}$ have the same $b$ value. For example, if some value $a_{i}$ is taken to its 100 th power ( $a_{i}^{\prime 100}$, degree term $p_{d}=100$ ) and will be used as inputs in multiple places of a circuit, then it would be wise to use a breaker to cut its degree to 1 (in linear polynomial form $\left.a_{i}^{100}+X \alpha_{i}\right)$ before being used as inputs in other places of a circuit.

### 4.5 Memory Efficiency

The memory consumption cost of our protocol is $O\left(p_{d}\right)$. However, our design approach allows us to improve the memory consumption cost of our protocol to $O\left(b+2 p_{d}\right)$. We only need to do two thing to achieve a memory cost of $O\left(b+2 p_{d}\right)$ : First, delete $\vec{\tau}$ from memory in line 34; Second, recompute $\vec{\tau}_{i}$ after challenge $x$ is available so that $\vec{y}^{\prime}$ can be computed (in line 41 ).

$$
\begin{array}{ll}
\text { for } & i=1, \ldots, m\{ \\
& r_{i}, \epsilon_{i}, \vec{\tau}_{i}=\text { computeSubCircuitKeys }{ }_{i}\left(\vec{a}^{\prime}, \vec{\alpha}^{\prime}\right) ; \\
& \epsilon_{i}=\left(\epsilon_{i}-\mu_{i}\right) \bmod q \in \mathbb{Z}_{q} ; \\
& \vec{a}^{\prime}=\vec{a}\left\|r_{i}, \quad \vec{\alpha}^{\prime}=\vec{\alpha}\right\| \epsilon_{i} ; \\
& z_{j}=z_{j}+\sum_{i=1}^{p_{d}}\left(\tau_{i, j} \cdot w^{i}\right) \in \mathbb{Z}_{q} \quad j=\{1, \ldots, b\} ; \\
\} &
\end{array}
$$

Since we have $b=p=\sqrt{p_{d}}$ in the default setting, the memory cost is approximately $O\left(3 \sqrt{p_{d}}\right)$.

### 4.6 Cost Analysis

The prover runtime of our protocol is dominated by $O\left(\iota p_{d} \log p_{d}+l\right)$ field operations and $O(b+l)$ group exponentiations. We set $b=\sqrt{p_{d}}$ in our benchmark testing, so the cost of group exponentiations grow sub-linearly to the total polynomial degree generated by the circuit. The value of $\iota$ depends on how the circuit is wired. For the sequential multiplication test case that we benchmark against, $\iota=m \sum_{i=1}^{\log b} i$; the verifier runtime is dominated by $O(n+l)$ field operations and $O(m+b+l)$ group exponentiations; and the communication cost is dominated by $O(b+l)$ group elements and $O(m+l)$ field elements.

Our protocol is also natively faster than its asymptotic cost indicates because group exp. operations of our protocol operate mostly in $q$ (61-bits), which is significantly smaller than $p$ in ECC. This gives us approximately 2.5 X performance gains when performing multi-exponentiations over standard ECC multiplications. Although the verifier runtime is technically linear, it is so efficient to the point that it is close to SNARKs with trusted setup. This is because all the asymptotically slow operations are performed at field level (many papers consider this free).

It is important to note that $p_{d} \neq n$. For some arithmetic circuits, the total polynomial degree $\left(p_{d}\right)$ can be even smaller than the number of multiplications. This is because every time we perform the "addition" operation on two output wires, two polynomials get merged (e.g. circuit $a_{1}^{8}+a_{2}^{3} \cdot a_{3}^{5}+a_{4}^{7}+a_{5}^{3}=r$ only outputs a polynomial of degree 8 , but 27 multiplications are performed). On the other hand, if $p_{d}$ can be bigger than the total number of gates, we may need to set breakers more aggressively to cut the $p_{d}$ value down to an acceptable level, preferably lower than $n$.

It is also worth noting that it is not necessary to set all breakers at some fixed point $b$. For example, if we know several output wires with high degrees are going to get merged through addition operations, we may want to save the breaker until they are merged together.

## 5 Performance Comparison

We compare the performance of our protocol to some of the most popular transparent zero-knowledge protocols for which open source codes are available. Our test runs are performed on an $\operatorname{Intel}(\mathrm{R})$ Core(TM) i7-9750H CPU @ 2.60 Ghz . All tests are run on a single CPU thread. Our test code is a non-interactive implementation (using Fiat-Shamir heuristic). For group operations, we use the curve25519-dalek implementation, and Pippenger acceleration is applied to all sum-of-product group operations. For field operations, we use the Montgomery algorithm to accelerate modular multiplications on the prime $q$.

We compare our protocol against Hyrax, Ligero, Aurora, and Spartan-NIZK. These protocols were chosen because they are the most representative of popular zero-knowledge protocols and can be verified with open source code. In particular, Aurora outperforms STARK in all key parameters (prover/verifier runtime, proof size), and the NIZK version of Spartan offers the most balanced performance across all performance parameters. We do not consider SNARKs because they are hardly efficient after switching to transparent mode.

We didn't consider protocols that depend on circuit depth, such as GKRbased protocols, because they cannot handle $2^{20}$ sequential multiplications.

Spartan ++ and Lakonia are two more recent developments that we didn't include in our benchmark testing but are worth mentioning. The improvement of Spartan++ over SpartanNIZK is marginal, and the performance of Lakonia is largely comparable to that of SpartanNIZK. (The prover performance of SpartanNIZK is approximately 3X more efficient, and the verifier performance is 1.5 X more efficient than that of Lakonia, while Lakonia is 4X more efficient than SpartanNIZK in proof size).

We set the number of inputs to our protocol to 20 integers. The circuit we use performs $n$ sequential multiplications on $l$ inputs. This is because we want to demonstrate that our protocol can handle high-depth circuits (e.g., $p_{d}=n$ ). If we run a shallow circuit where $p_{d}$ number is small, the benchmark result will likely be significantly better. For example, if we have a circuit that computes the sum-of-products of inputs where $n=2^{20}$ (e.g., $\sum_{i=1}^{n} a_{i} b_{i}=a_{1} b_{1}+$ $a_{2} b_{2} \ldots+a_{n} b_{n}$ ), the prover will complete proving such a shallow circuit (one million multiplications and one million sums) in under 50 milliseconds for such a shallow circuit with $p_{d}=1$ (group exponentiation cost is approaching zero and NTT is not even necessary for field operations).

As we mentioned earlier, every time an "addition or subtraction" operation is performed, the $p_{d}$ value of two input wires merges into one. e.g., if the first input wire $a_{1}^{\prime 7}$ has $p_{d}=7$ and the second input wire ${a_{1}^{\prime}}^{6}$ has $p_{d}=6$, their sum is an output wire with $p_{d}=7$. In other words, the more addition/subtraction
operations evenly distributed in a circuit, the smaller the $p_{d}$ value compared to $n$ and gets closer to the value of the multiplication depth of the circuit. The sum-of-products circuit explained earlier has a multiplication depth of 1 .

There are special cases where in some sub-circuits we have $p_{d}$ value grow faster than the number of multiplications performed (e.g., $\left.o=\left(\left(\left(a_{1}^{\prime}\right)^{2}\right)^{2}\right)^{2}\right)$. In such cases, we may want to place breakers more aggressively to better handle the situation. This can be done by either setting fixed $b<\sqrt{p_{d}}$ or placing more breakers inside these uncommon sub-circuits. In a real-world situation where there are both multiplication and addition operations, it is unlikely that $p_{d}$ is bigger than $n$.

To maximize the advantage of the NTT algorithm in computing sequential multiplications, we process each segment (subCircuit ${ }_{i}$ ) of our circuit in binary tree format to represent layers we would see in the real world. Such tuning will likely not be required in real-world applications since large circuits are usually layered and multiplication gates should be somewhat balanced out across layers already.

### 5.1 Benchmark at different at balanced $b$ value

We set $b=p_{d}=\sqrt{p_{d}}$ to get a more balanced result. Alternatively, one can set $b$ smaller to get better prover runtime performance in exchange for more expensive verifier and communication costs (section 5.2). This is because doing so will 1) may significantly cut down the polynomial degree $p_{d}$ value of the circuit. 2) evade expensive NTT computations at high degrees and better leverage Pippenger acceleration in computing $R_{t}, E_{t}$, which will continuously improve group exponentiation operations before peaking out at around $m=2^{14}$.

Table 1. Prover performance comparison (seconds)

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 1 | 2.8 | 9 | 36 | 117 | 486 |
| Ligero | 0.1 | 0.4 | 1.6 | 4 | 17 | 69 |
| Aurora | 0.5 | 1.6 | 6.5 | 27 | 116 | 485 |
| SpartanNIZK | 0.02 | 0.05 | 0.16 | 0.6 | 1.7 | 6.2 |
| This work | 0.004 | 0.005 | 0.01 | 0.03 | 0.13 | 0.57 |

Table 1 shows that as the circuit size gets bigger, the prover performance of our protocol is becoming increasingly more efficient than all the other protocols we are comparing against. This is because the cost associated with the number of inputs to the circuit is fixed ( 20 inputs), and its impact relative to the cost of evaluating the whole circuit gradually declines as the circuit size gets bigger (the same effect will also apply to verifier runtime and proof size benchmarks below). To the best of our knowledge, our protocol offers the best prover performance in the non-interactive setting in the literature.

Table 2. Proof size comparison (kilobytes)

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 14 | 17 | 21 | 28 | 38 | 58 |
| Ligero | 546 | 1,076 | 2,100 | 5,788 | 10,527 | 19,828 |
| Aurora | 477 | 610 | 810 | 1,069 | 1,315 | 1,603 |
| SpartanNIZK | 9 | 12 | 15 | 21 | 30 | 48 |
| This work | 3 | 4 | 7 | 11 | 19 | 36 |

Table 2 shows that the communication cost of our protocol dominates that of Ligero and Aurora, while being largely comparable to SpartanNIZK and Hyrax. The fixed cost of one additional input is 112 bytes, you can add or subtract as many inputs as needed to get the communication cost that fits your scenario rather than take the default $l=20$. Please note that other protocols also incur comparable costs when they map more witnesses to some precommitted/encrypted value in the public data store.

Table 3. Verifier performance comparison (milliseconds)

| Circuit size | $2^{10}$ | $2^{12}$ | $2^{14}$ | $2^{16}$ | $2^{18}$ | $2^{20}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Hyrax | 206 | 253 | 331 | 594 | 1.6 s | 8.1 s |
| Ligero | 50 | 179 | 700 | 2 s | 7.5 s | 33 s |
| Aurora | 192 | 590 | 2 s | 7.2 s | 29.8 s | 118 s |
| SpartanNIZK | 7 | 11 | 17 | 36 | 103 | 387 |
| This work | 1.4 | 1.4 | 1.5 | 1.8 | 3 | 12.5 |

Table 3 demonstrates that our protocol achieves a significant improvement of over one order of magnitude in verifier runtime compared to other protocols.

The NTT friendly prime number $q$ we used for our benchmark testing is 1945555039024054273, a 61-bit prime that implies the soundness error will be at most $\frac{b}{q}=2^{-51}$ in our test case ( $b=2^{10}$ when $p_{d}=2^{20}$ and $q$ is a 61-bit prime), which is good enough for many applications and ones where one interaction is allowed.

When a soundness error of $2^{-51}$ is not enough, the dumb and straight-forward way is to run the whole protocol twice to get a soundness error of at least $\left(\frac{b}{q}\right)^{2}=2^{-102}$. This will almost double the cost of everything (the prover only needs to compute vector commitment $R_{t}$ once), but our protocol will still claim the title of the fastest prover and verifier runtime in the literature by a wide margin. The more advanced way is to use a bigger $q$ prime. For example, a 90 -bit $q$ prime will comfortably increase the soundness error to at least $2^{-80}$, just that there would be a lot of engineering work to get an efficient NTT and modular arithmetic implementation at a higher bit value. e.g., at 128-bit, which will require optimization at the assembly level for which no open source code is
available at the moment, unlike that for 64 -bit (come with the CPU) and 256-bit (optimized over the years because of ECC).

It is worth noticing that input transformation costs can be shared across multiple circuits if the inputs are reused as inputs to other circuits, which may lead to further reductions in communication costs in the real world.

### 5.2 Benchmark at different $b$ value in memory efficient setting

One of the biggest advantages of our protocol is that it provides a blueprint for developing a fast, memory-efficient, non-interactive zero-knowledge protocol. The theoretical memory cost is $O\left(b+2 p_{d}\right)$. Using the memory-efficient implementation mentioned in Section 4.5, we get the results listed in Table 4.

Table 4. Performance comparison in memory efficient setting

| b | Prover | Verifier | Communication Cost |
| :---: | :---: | :---: | :---: |
| $2^{10}$ | 1.14 s | 11 ms | 55 KB |
| $2^{9}$ | 1.03 s | 11 ms | 55 KB |
| $2^{8}$ | 0.93 s | 18 ms | 106 KB |
| $2^{7}$ | 0.83 s | 27 ms | 208 KB |
| $2^{6}$ | 0.74 s | 50 ms | 414 KB |
| $2^{5}$ | 0.70 s | 100 ms | 827 KB |
| $2^{4}$ | 0.72 s | 201 ms | 1.6 MB |
| $2^{3}$ | 0.91 s | 397 ms | 3.3 MB |

The prover performance shown in Table 5 shows our protocol peaks at around 1.43 million multiplications per second when $b=2^{5}$, then it starts to increase again afterwards. This is because the cost of computing the vector commitments $R_{t}, E_{t}$ gets increasingly expensive as the number of sub-circuits ( $m$ ) increases. Therefore, the cost of performing group exponentiation operations committing $\vec{r}, \vec{\epsilon}$ is getting more expensive relative to the shrinking cost of field operations computing $\vec{r}, \vec{\epsilon}, \vec{\tau}$ as $b$ decreases.

However, table 4 is a little bit deceiving because the smaller the $b$ may also lead to a smaller polynomial degree $p_{d}$ value of a circuit (e.g., when some output of a sub-circuit with a high degree is used multiple times to multiply other values of a circuit.).

Regardless, the benchmark numbers shown in Table 4 compare well against top-of-the line VOLE-based protocols (shown in Table 5; reported numbers are copied directly from their paper [35), given that our protocol is non-interactive and offers a significantly smaller proof size without requiring pre-setup like that of Ant-Man.

Note that there are other techniques for improving prover memory: Commit-and-prove to glue sub-circuits together (Lunar/Eclipse [14] 2]), streaming SNARKs (Gemini [10]). However, the reported constructions of these protocols require

Table 5. Performance of VOLE protocols (Arithmetic Circuit)

| Protocol | Size | Speed | Non-Interactive |
| :---: | :---: | :---: | :---: |
| Wolverine | 4 | 0.66 M | No |
| Mac'n'Cheese | 3 | 0.4 M | No |
| QuickSilver | 1 | 4.8 M | No |
| This work | $\frac{1}{b}$ | $\geq 1.43 \mathrm{M}$ | Yes |

trusted setup (non-transparent), and the prover runtime cost of these protocols is magnitudes more expensive.

### 5.3 Boolean Circuit

Our protocol is optimized for arithmetic circuits, and we believe the best way to use our protocol in the real world is to run an arithmetic circuit with embedded range-proof to perform comparison and floating point operations as explained in Section 4.3. We believe this joined approach can eliminate the need to use boolean circuits in most use cases (except for things like proof of hash, which we doubt has any real-world application).

## 6 Acknowledgments

Special thanks to Danli Xie and Jam Jia for helping out on the coding and making the code more efficient.

## References

1. Ames, S., Hazay, C., Ishai, Y., Venkitasubramaniam, M.: Ligero: Lightweight sublinear arguments without a trusted setup. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017. pp. 2087-2104. ACM Press, Dallas, TX, USA (Oct 31 - Nov 2, 2017). https://doi.org/10.1145/3133956.3134104
2. Aranha, D.F., Bennedsen, E.M., Campanelli, M., Ganesh, C., Orlandi, C., Takahashi, A.: ECLIPSE: Enhanced compiling method for pedersen-committed zkSNARK engines. In: Hanaoka, G., Shikata, J., Watanabe, Y. (eds.) PKC 2022, Part I. LNCS, vol. 13177, pp. 584-614. Springer, Heidelberg, Germany, Virtual Event (Mar 8-11, 2022). https://doi.org/10.1007/978-3-030-97121-2 $2_{2}$
3. Arora, S., Lund, C., Motwani, R., Sudan, M., Szegedy, M.: Proof verification and hardness of approximation problems. In: 33rd FOCS. pp. 1423. IEEE Computer Society Press, Pittsburgh, PA, USA (Oct 24-27, 1992). https://doi.org/10.1109/SFCS.1992.267823
4. Arora, S., Safra, S.: Probabilistic checking of proofs; A new characterization of NP. In: 33rd FOCS. pp. 2-13. IEEE Computer Society Press, Pittsburgh, PA, USA (Oct 24-27, 1992). https://doi.org/10.1109/SFCS.1992.267824
5. Babai, L., Fortnow, L., Levin, L.A., Szegedy, M.: Checking computations in polylogarithmic time. In: 23rd ACM STOC. pp. 21-31. ACM Press, New Orleans, LA, USA (May 6-8, 1991). https://doi.org/10.1145/103418.103428
6. Babai, L., Fortnow, L., Lund, C.: Non-deterministic exponential time has two-prover interactive protocols. In: 31st FOCS. pp. 16-25. IEEE Computer Society Press, St. Louis, MO, USA (Oct 22-24, 1990). https://doi.org/10.1109/FSCS.1990.89520
7. Baum, C., Malozemoff, A.J., Rosen, M.B., Scholl, P.: Mac'n'cheese: Zeroknowledge proofs for boolean and arithmetic circuits with nested disjunctions. In: Malkin, T., Peikert, C. (eds.) CRYPTO 2021, Part IV. LNCS, vol. 12828, pp. 92-122. Springer, Heidelberg, Germany, Virtual Event (Aug 16-20, 2021). https://doi.org/10.1007/978-3-030-84259-84
8. Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M.: Scalable zero knowledge with no trusted setup. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019, Part III. LNCS, vol. 11694, pp. 701-732. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 18-22, 2019). https://doi.org/10.1007/978-3-030-26954-823
9. Bhadauria, R., Fang, Z., Hazay, C., Venkitasubramaniam, M., Xie, T., Zhang, Y.: Ligero++: A new optimized sublinear IOP. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) ACM CCS 2020. pp. 2025-2038. ACM Press, Virtual Event, USA (Nov 9-13, 2020). https://doi.org/10.1145/3372297.3417893
10. Bootle, J., Chiesa, A., Hu, Y., Orrù, M.: Gemini: Elastic SNARKs for diverse environments. In: Dunkelman, O., Dziembowski, S. (eds.) EUROCRYPT 2022, Part II. LNCS, vol. 13276, pp. 427-457. Springer, Heidelberg, Germany, Trondheim, Norway (May 30 - Jun 3, 2022). https://doi.org/10.1007/978-3-031-07085-315
11. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y.: Compressing vector OLE. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) ACM CCS 2018. pp. 896-912. ACM Press, Toronto, ON, Canada (Oct 15-19, 2018). https://doi.org/10.1145/3243734.3243868
12. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., Rindal, P., Scholl, P.: Efficient two-round OT extension and silent non-interactive secure computation. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 291-308. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3354255
13. Boyle, E., Couteau, G., Gilboa, N., Ishai, Y., Kohl, L., Scholl, P.: Efficient pseudorandom correlation generators: Silent OT extension and more. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019, Part III. LNCS, vol. 11694, pp. 489518. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 18-22, 2019). https://doi.org/10.1007/978-3-030-26954-816
14. Campanelli, M., Faonio, A., Fiore, D., Querol, A., Rodríguez, H.: Lunar: A toolbox for more efficient universal and updatable zkSNARKs and commit-and-prove extensions. In: Tibouchi, M., Wang, H. (eds.) ASIACRYPT 2021, Part III. LNCS, vol. 13092, pp. 3-33. Springer, Heidelberg, Germany, Singapore (Dec 6-10, 2021). https://doi.org/10.1007/978-3-030-92078-4
15. Chen, B., Bünz, B., Boneh, D., Zhang, Z.: HyperPlonk: Plonk with linear-time prover and high-degree custom gates. In: Hazay, C., Stam, M. (eds.) EUROCRYPT 2023, Part II. LNCS, vol. 14005, pp. 499-530. Springer, Heidelberg, Germany, Lyon, France (Apr 23-27, 2023). https://doi.org/10.1007/978-3-031-30617417
16. Chiesa, A., Hu, Y., Maller, M., Mishra, P., Vesely, P., Ward, N.P.: Marlin: Preprocessing zkSNARKs with universal and updatable SRS. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part I. LNCS, vol. 12105, pp. 738-768. Springer, Heidelberg, Germany, Zagreb, Croatia (May 10-14, 2020). https://doi.org/10.1007/978-3-030-45721-1 $1_{2} 6$
17. Cramer, R., Damgård, I.: Zero-knowledge proofs for finite field arithmetic; or: Can zero-knowledge be for free? In: Krawczyk, H. (ed.) CRYPTO'98. LNCS, vol. 1462, pp. 424-441. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 23-27, 1998). https://doi.org/10.1007/BFb0055745
18. Dittmer, S., Ishai, Y., Lu, S., Ostrovsky, R.: Improving line-point zero knowledge: Two multiplications for the price of one. In: Yin, H., Stavrou, A., Cremers, C., Shi, E. (eds.) ACM CCS 2022. pp. 829-841. ACM Press, Los Angeles, CA, USA (Nov 7-11, 2022). https://doi.org/10.1145/3548606.3559385
19. Dittmer, S., Ishai, Y., Ostrovsky, R.: Line-point zero knowledge and its applications. Cryptology ePrint Archive, Report 2020/1446 (2020), https://eprint. iacr.org/2020/1446
20. Frederiksen, T.K., Nielsen, J.B., Orlandi, C.: Privacy-free garbled circuits with applications to efficient zero-knowledge. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015, Part II. LNCS, vol. 9057, pp. 191-219. Springer, Heidelberg, Germany, Sofia, Bulgaria (Apr 26-30, 2015). https://doi.org/10.1007/978-3-662-46803-67
21. Gabizon, A., Williamson, Z.J., Ciobotaru, O.: PLONK: Permutations over lagrange-bases for oecumenical noninteractive arguments of knowledge. Cryptology ePrint Archive, Report 2019/953 (2019), https://eprint.iacr.org/2019/953
22. Giacomelli, I., Madsen, J., Orlandi, C.: ZKBoo: Faster zero-knowledge for Boolean circuits. In: Holz, T., Savage, S. (eds.) USENIX Security 2016. pp. 1069-1083. USENIX Association, Austin, TX, USA (Aug 10-12, 2016)
23. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof-systems (extended abstract). In: 17th ACM STOC. pp. 291-304. ACM Press, Providence, RI, USA (May 6-8, 1985). https://doi.org/10.1145/22145.22178
24. Groth, J., Kohlweiss, M., Maller, M., Meiklejohn, S., Miers, I.: Updatable and universal common reference strings with applications to zk-SNARKs. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018, Part III. LNCS, vol. 10993, pp. 698728. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 19-23, 2018). https://doi.org/10.1007/978-3-319-96878-0 $0_{2} 4$
25. Heath, D., Kolesnikov, V.: Stacked garbling for disjunctive zero-knowledge proofs. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020, Part III. LNCS, vol. 12107, pp. 569-598. Springer, Heidelberg, Germany, Zagreb, Croatia (May 10-14, 2020). https://doi.org/10.1007/978-3-030-45727-319
26. Kiayias, A., Tang, Q.: How to keep a secret: leakage deterring publickey cryptosystems. In: Sadeghi, A.R., Gligor, V.D., Yung, M. (eds.) ACM CCS 2013. pp. 943-954. ACM Press, Berlin, Germany (Nov 4-8, 2013). https://doi.org/10.1145/2508859.2516691
27. Maller, M., Bowe, S., Kohlweiss, M., Meiklejohn, S.: Sonic: Zero-knowledge SNARKs from linear-size universal and updatable structured reference strings. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 2111-2128. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3339817
28. Schnorr, C.P.: Efficient identification and signatures for smart cards. In: Brassard, G. (ed.) CRYPTO'89. LNCS, vol. 435, pp. 239-252. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 20-24, 1990). https://doi.org/10.1007/0-387-34805-0 2
29. Schoppmann, P., Gascón, A., Reichert, L., Raykova, M.: Distributed vector-OLE: Improved constructions and implementation. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) ACM CCS 2019. pp. 1055-1072. ACM Press, London, UK (Nov 11-15, 2019). https://doi.org/10.1145/3319535.3363228
30. Setty, S.: Spartan: Efficient and general-purpose zkSNARKs without trusted setup. In: Micciancio, D., Ristenpart, T. (eds.) CRYPTO 2020, Part III. LNCS, vol. 12172, pp. 704-737. Springer, Heidelberg, Germany, Santa Barbara, CA, USA (Aug 17-21, 2020). https://doi.org/10.1007/978-3-030-56877-1 25
31. Setty, S., Lee, J.: Quarks: Quadruple-efficient transparent zkSNARKs. Cryptology ePrint Archive, Report 2020/1275 (2020), https://eprint.iacr.org/2020/1275
32. Wahby, R.S., Tzialla, I., shelat, a., Thaler, J., Walfish, M.: Doubly-efficient zkSNARKs without trusted setup. Cryptology ePrint Archive, Report 2017/1132 (2017), https://eprint.iacr.org/2017/1132
33. Weng, C., Yang, K., Katz, J., Wang, X.: Wolverine: Fast, scalable, and communication-efficient zero-knowledge proofs for boolean and arithmetic circuits. In: 2021 IEEE Symposium on Security and Privacy. pp. 1074-1091. IEEE Computer Society Press, San Francisco, CA, USA (May 24-27, 2021). https://doi.org/10.1109/SP40001.2021.00056
34. Weng, C., Yang, K., Yang, Z., Xie, X., Wang, X.: AntMan: Interactive zeroknowledge proofs with sublinear communication. In: Yin, H., Stavrou, A., Cremers, C., Shi, E. (eds.) ACM CCS 2022. pp. 2901-2914. ACM Press, Los Angeles, CA, USA (Nov 7-11, 2022). https://doi.org/10.1145/3548606.3560667
35. Yang, K., Sarkar, P., Weng, C., Wang, X.: QuickSilver: Efficient and affordable zero-knowledge proofs for circuits and polynomials over any field. In: Vigna, G., Shi, E. (eds.) ACM CCS 2021. pp. 2986-3001. ACM Press, Virtual Event, Republic of Korea (Nov 15-19, 2021). https://doi.org/10.1145/3460120.3484556
36. Yang, K., Weng, C., Lan, X., Zhang, J., Wang, X.: Ferret: Fast extension for correlated OT with small communication. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) ACM CCS 2020. pp. 1607-1626. ACM Press, Virtual Event, USA (Nov 913, 2020). https://doi.org/10.1145/3372297.3417276
37. Zhang, J., Xie, T., Zhang, Y., Song, D.: Transparent polynomial delegation and its applications to zero knowledge proof. In: 2020 IEEE Symposium on Security and Privacy. pp. 859-876. IEEE Computer Society Press, San Francisco, CA, USA (May 18-21, 2020). https://doi.org/10.1109/SP40000.2020.00052
