All papers (Page 12 of 24080 results)
$\widetilde{\mbox{O}}$ptimal Adaptively Secure Hash-based Asynchronous Common Subset
Asynchronous multiparty computation (AMPC) requires an input agreement phase where all participants have a consistent view of the set of private inputs. While the input agreement problem can be precisely addressed by a Byzantine fault-tolerant consensus known as Asynchronous Common Subset (ACS), existing ACS constructions with potential post-quantum security have a large $\widetilde{\mathcal{O}}(n^3)$ communication complexity for a network of $n$ nodes. This poses a bottleneck for AMPC in the same setting. In contrast, ACS has optimal constructions with quadratic communication complexity based on bilinear map assumptions.
In this paper, we bridge this gap by introducing a nearly optimal ACS, which solely relies on the blackbox use of collision-resistant hash functions. It exhibits $\widetilde{\mathcal{O}}(n^2)$ communication complexity, expected constant round complexity, and security against adaptive adversaries who can corrupt up to $n/3$ nodes and perform ``after-fact-removal'' attacks.
At the core of our new ACS is the first nearly optimal hash-based Multi-valued Validated Byzantine Agreement (MVBA).
To reduce cubic communication while avoiding heavy cryptographic tools, we introduce a new design paradigm, with several new components. We define and analyze our MVBA and components within the UC-framework, facilitating their modular use in broader applications, particularly in AMPC.
Do Not Disturb a Sleeping Falcon: Floating-Point Error Sensitivity of the Falcon Sampler and Its Consequences
Falcon is one of the three postquantum signature schemes already selected by NIST for standardization. It is the most compact among them, and offers excellent efficiency and security. However, it is based on a complex algorithm for lattice discrete Gaussian sampling which presents a number of implementation challenges. In particular, it relies on (possibly emulated) floating-point arithmetic, which is often regarded as a cause for concern, and has been leveraged in, e.g., side-channel analysis. The extent to which Falcon's use of floating point arithmetic can cause security issues has yet to be thoroughly explored in the literature.
In this paper, we contribute to filling this gap by identifying a way in which Falcon's lattice discrete Gaussian sampler, due to specific design choices, is singularly sensitive to floating-point errors. In the presence of small floating-point discrepancies (which can occur in various ways, including the use of the two almost but not quite equivalent signing procedures ``dynamic'' and ``tree'' exposed by the Falcon API), we find that, when called twice on the same input, the Falcon sampler has a small but significant chance (on the order of once in a few thousand calls) of outputting two different lattice points with a very structured difference, that immediately reveals the secret key. This is in contrast to other lattice Gaussian sampling algorithms like Peikert's sampler and Prest's hybrid sampler, that are stable with respect to small floating-point errors.
Correctly generated Falcon signatures include a salt that should in principle prevent the sampler to ever be called on the same input twice. In that sense, our observation has little impact on the security of Falcon signatures per se (beyond echoing warnings about the dangers of repeated randomness). On the other hand, it is critical for derandomized variants of Falcon, which have been proposed for use in numerous settings. One can mention in particular identity-based encryption, SNARK-friendly signatures, and sublinear signature aggregation. For all these settings, small floating point discrepancies have a chance of resulting in full private key exposure, even when using the slower, integer-based emulated floating-point arithmetic of Falcon's reference implementation.
Subliminal Encrypted Multi-Maps and Black-Box Leakage Absorption
We propose a dynamic, low-latency encrypted multi-map (EMM) that operates in two
modes: low-leakage mode, which reveals minimal information such as data
size, expected response length, and query arrival rate; and subliminal
mode, which reveals only the data size while hiding metadata including query
and update times, the number of operations executed, and even whether an
operation was executed at all---albeit at the cost of full correctness. We
achieve this by exploiting a tradeoff between leakage and latency, a previously
underexplored resource in EMM design. In low-leakage mode, our construction
improves upon existing work both asymptotically and empirically: it achieves
optimal server-side storage, as well as communication and computational
complexity that is independent of the maximum response length. In subliminal
mode, it is the first construction to hide metadata.
To analyze the latency and client-side storage of our construction, we utilize queuing theory and introduce a new queuing model, which may be of independent interest. To examine its metadata-hiding properties, we extend standard security definitions to account for metadata and prove a surprising result: if a scheme is subliminal in that it hides the execution of its operations, then it absorbs the leakage of any scheme that makes black-box use of it without sending additional messages. In other words, if a scheme is subliminal, then any scheme that makes black-box use of it will also be subliminal.
We implement and evaluate our construction, demonstrating that our empirical results align with our theoretical analysis and that the scheme achieves a median query latency below $10$ milliseconds, making it practical for some applications.
CountCrypt: Quantum Cryptography between QCMA and PP
We construct a quantum oracle relative to which $\mathbf{BQP}=\mathbf{QCMA}$ but quantum-computation-classical-communication (QCCC) key exchange, QCCC commitments, and two-round quantum key distribution exist. We also construct an oracle relative to which $\mathbf{BQP}=\mathbf{QMA}$, but quantum lightning (a stronger variant of quantum money) exists. This extends previous work by Kretschmer [Kretschmer, TQC22], which showed that there is a quantum oracle relative to which $\mathbf{BQP}=\mathbf{QMA}$ but pseudorandom state generators (a quantum variant of pseudorandom generators) exist.
We also show that QCCC key exchange, QCCC commitments, and two-round quantum key distribution can all be used to build one-way puzzles. One-way puzzles are a version of "quantum samplable" one-wayness and are an intermediate primitive between pseudorandom state generators and EFI pairs, the minimal quantum primitive. In particular, one-way puzzles cannot exist if $\mathbf{BQP}=\mathbf{PP}$.
Our results together imply that aside from pseudorandom state generators, there is a large class of quantum cryptographic primitives which can exist even if $\mathbf{BQP} = \mathbf{QCMA}$, but are broken if $\mathbf{BQP} = \mathbf{PP}$. Furthermore, one-way puzzles are a minimal primitive for this class. We denote this class "CountCrypt".
State of the art of HFE variants Is it possible to repair HFE with appropriate perturbations?
HFE (that stands for Hidden Field Equations) belongs to
multivariate cryptography and was designed by Jacques Patarin in 1996
as a public key trapdoor suitable for encryption or signature. This original basic version is unfortunately known to have a super-polynomial
attack, but as imagined since the beginning, it comes with various variants, one can describe as combinations of “modifiers”.
In this work, we first present the state of the art of these HFE modifiers,
along with their effect on the complexity of the main cryptanalysis techniques against HFE-based schemes. This allows us, in a second time, to
identify a combination of two modifiers that has not yet been explored
and may still be secure with efficient parameters. Based on our analysis,
we propose a new signature scheme that offers extremely short signature
sizes, with reasonable public key sizes and performance. In particular, we
rely on the classical Feistel-Patarin technique to reduce signature sizes
below two times the security parameter.
Dumbo-MPC: Efficient Fully Asynchronous MPC with Optimal Resilience
Fully asynchronous multi-party computation (AMPC) has superior robustness in realizing privacy and guaranteed output delivery (G.O.D.) against asynchronous adversaries that can arbitrarily delay communications. However, none of these protocols are truly practical, as they either have sub-optimal resilience, incur cumbersome communication cost, or suffer from an online phase with extra cryptographic overhead. The only attempting implementation---HoneyBadgerMPC (hbMPC)---merely ensures G.O.D. in some implausible optimistic cases due to a non-robust offline pre-processing phase.
We propose Dumbo-MPC a concretely efficient AMPC-as-a-service design with all phases G.O.D. and optimal resilience against $t<n/3$ malicious parties (where $n$ is the total number of parties). Same to hbMPC, Dumbo-MPC has a robust (almost) information-theoretic online phase that can efficiently perform online computations, given pre-processed multiplication triples. While for achieving all phases G.O.D., we design a novel dual-mode offline protocol that can robustly pre-process multiplication triples in asynchrony. The offline phase features $O(n)$ per-triple communication in the optimistic case, followed by a fully asynchronous fallback to a pessimistic path to securely restore G.O.D. in the bad case. To efficiently implement the pessimistic path, we devise a concretely efficient zk-proof for product relationship of secret shares over compact KZG polynomial commitments, which enables us to reduce the degree of two secret shares' product from $2t$ to $t$ and could be of independent interest.
We also implement and extensively evaluate Dumbo-MPC (particularly its offline phase) in varying network settings with up to 31 AWS servers. To our knowledge, we provide the first implementation of AMPC with all-phase G.O.D. A recent asynchronous triple generation protocol from Groth and Shoup (GS23) is also implemented and experimentally compared. When $n = 31$, Dumbo-MPC generates 94 triples/sec (almost twice of GS23) in the pessimistic case and 349 triples/sec (6X of GS23) in the good case, such that 31 parties require only 2-8 min to prepare a private Vickrey auction of 100 bidders or 10-36 min for a mixing network of $2^{10}$ inputs.
From One-Time to Two-Round Reusable Multi-Signatures without Nested Forking
Multi-signature schemes are gaining significant interest due to their blockchain applications. Of particular interest are two-round schemes in the plain public-key model that offer key aggregation, and whose security is based on the hardness of the DLOG problem. Unfortunately, despite substantial recent progress, the security proofs of the proposed schemes provide rather insufficient concrete guarantees (especially for 256-bit groups). This frustrating situation has so far been approached either by relying on the security of seemingly stronger assumptions or by considering restricted classes of attackers (e.g., algebraic attackers, which are assumed to provide an algebraic justification of each group element that they produce).
We present a complementing approach by constructing multi-signature schemes that satisfy two relaxed notions of security, whose applicability nevertheless ranges from serving as drop-in replacements to enabling expressive smart contract validation procedures. Our first notion, one-time unforgeability, extends the analogous single-signer notion by considering attackers that obtain a single signature for some message and set of signers of their choice. We construct a non-interactive one-time scheme based on any ring-homomorphic one-way function, admitting efficient instantiations based on the DLOG and RSA assumptions. Aggregated verification keys and signatures consist of two group elements and a single group element, respectively, and our security proof consists of a single application of the forking lemma (thus avoiding the substantial security loss exhibited by the proposed two-round schemes). Additionally, we demonstrate that our scheme naturally extends to a $t$-time scheme, where aggregated verification keys consist of $t+1$ group elements, while aggregated signatures still consist of a single group element.
Our second notion, single-set unforgeability, considers attackers that obtain any polynomial number of signatures but are restricted to a single set of signers of their choice. We transform any non-interactive one-time scheme into a two-round single-set scheme via a novel forking-free construction that extends the seminal Naor-Yung tree-based approach to the multi-signer setting. Aggregated verification keys are essentially identical to those of the underlying one-time scheme, and the length of aggregated signatures is determined by that of the underlying scheme while scaling linearly with the length of messages (noting that long messages can always be hashed using a collision-resistant function). Instantiated with our one-time scheme, we obtain aggregated verification keys and signatures whose lengths are completely independent of the number of signers.
Free-XOR Gate Bootstrapping
The FHEW-like gate bootstrapping framework operates in a 2-bit plaintext space, where logic gates such as NAND, XOR, and AND are implemented by adding two ciphertexts and extracting the most significant bit. However, each gate operation requires bootstrapping with a primary cost of one blind rotation, which is expensive, when processing circuit operations for applications. We propose a novel Free-XOR gate bootstrapping framework based on a single-bit plaintext space, in which the XOR operation is realized by simply adding two ciphertexts, resulting in an almost free computational cost. To form a minimal complete set for logical operations, we design an algorithm for the AND gate within this framework. The AND gate cost of our Free-XOR gate bootstrapping involves two blind rotations. However, by utilizing a single-bit plaintext space to enhance noise tolerance and swapping some operations of the bootstrapping process, we can adopt a more compact parameter setting, which in turn accelerates the speed of blind rotation. We propose an instantiation of the NTRU-based AND gate operation, which requires two blind rotations. Despite the additional rotation, the overall computational cost is marginally lower than the state-of-the-art gate bootstrapping scheme LLW+ [TCHES24], which utilizes only a single blind rotation. In addition, our approach achieves a significant reduction in key size, reducing it to 3.3 times the size of LLW+ [TCHES24].
Secure and efficient transciphering for FHE-based MPC
Transciphering (or Hybrid-Homomorphic Encryption, HHE) is an es-
tablished technique for avoiding ciphertext expansion in HE applications, saving communication and storage resources. Recently, it has also been shown to be a fundamental component in the practical construction of HE-based multi-party computation (MPC) protocols, being used both for input data and intermediary results (Smart, IMACC 2023). In these protocols, however, ciphers are used with keys that are jointly generated by multiple (possibly malicious) parties, which may require additional security assumptions that have been so far overlooked in the HHE literature. In this paper, we formalize this issue as a security against related-key attacks (RKA) problem and provide efficient solutions for it. We start by presenting an efficient method for homomorphically evaluating Mixed-Filter-Permutator (MFP) ciphers in leveled mode, enabling speedups of up to thousands of times compared to previous literature. For the multi-party scenario, we focus specifically on the Margrethe cipher (Hoffmann et al., INDOCRYPT 2023). We show that, contrary to other commonly used HHE ciphers (e.g. FLIP), Margrethe is out-of-the-box secure for any protocols that allow malicious parties to learn up to two related key streams, enabling security for the vast majority of static MPC protocols. For other cases, we quantify the loss of security based on the number of related key streams (which often depends on the number of malicious parties and specific protocol). Performance-wise, our implementation of Margrethe takes just 3.9 ms to transcipher 4 bit messages, being significantly faster than the state of the art in terms of latency.
Secure Computation with Parallel Calls to 2-ary Functions
Reductions are the workhorses of cryptography. They allow constructions of complex cryptographic primitives from simple building blocks. A prominent example is the non-interactive reduction from securely computing a ``complex" function $f$ to securely computing a ``simple" function $g$ via randomized encodings.
Prior work equated simplicity with functions of small degree. In this work, we consider a different notion of simplicity where we require $g$ to only take inputs from a small number of parties. In other words, we want the arity of $g$ to be as small as possible.
In more detail, we consider the problem of reducing secure computation of arbitrary functions to secure computation of functions with arity two (two is the minimal arity required to compute non-trivial functions). Specifically, we want to compute a function $f$ via a protocol that makes parallel calls to 2-ary functions. We want this protocol to be secure against malicious adversaries that could corrupt an arbitrary number of parties. We obtain the following results:
- Negative Result: We show that there exists a degree-2 polynomial $p$ such that no protocol that makes parallel calls to 2-ary functions can compute $p$ with statistical security with abort.
- Positive Results: We give two ways to bypass the above impossibility result.
1. Weakening the Security Notion. We show that every degree-2 polynomial can be computed with statistical privacy with knowledge of outputs (PwKO) by making parallel calls to 2-ary functions. Privacy with knowledge of outputs is weaker than security with abort.
2. Computational Security. We prove that for every function $f$, there exists a protocol for computing $f$ that makes parallel calls to 2-ary functions and achieves security with abort against computationally-bounded adversaries. The security of this protocol relies on the existence of semi-honest secure oblivious transfer.
- Applications: We give connections between this problem and the task of reducing the encoding complexity of Multiparty Randomized Encodings (MPRE) (Applebaum, Brakerski, and Tsabary, TCC 2018). Specifically, we show that under standard computational assumptions, there exists an MPRE where the encoder can be implemented by an $\mathrm{NC}^0$ circuit with constant fan-out.
- Extensions: We explore this problem in the honest majority setting and give similar results assuming one-way functions. We also show that if the parties have access to 3-ary functions then we can construct a computationally secure protocol in the dishonest majority setting assuming one-way functions.
Does quantum lattice sieving require quantum RAM?
In this paper, we study the requirement for quantum random access memory (QRAM) in quantum lattice sieving, a fundamental algorithm for lattice-based cryptanalysis.
First, we obtain a lower bound on the cost of quantum lattice sieving with a bounded size QRAM. We do so in a new query model encompassing a wide range of lattice sieving algorithms similar to those in the classical sieving lower bound by Kirshanova and Laarhoven [CRYPTO 21]. This implies that, under reasonable assumptions, quantum speedups in lattice sieving require the use of QRAM. In particular, no quantum speedup is possible without QRAM.
Second, we investigate the trade-off between the size of QRAM and the quantum speedup. We obtain a new interpolation between classical and quantum lattice sieving. Moreover, we show that further improvements require a novel way to use the QRAM by proving the optimality of some subroutines. An important caveat is that this trade-off requires a strong assumption on the efficient replacement of QRAM data, indicating that even speedups with a small QRAM are already challenging.
Finally, we provide a circuit for quantum lattice sieving without using QRAM. Our circuit has a better depth complexity than the best classical algorithms but requires an exponential amount of qubits. To the best of our knowledge, this is the first quantum speedup for lattice sieving without QRAM in the standard quantum circuit model. We explain why this circuit does not contradict our lower bound, which considers the query complexity.
HADES: Range-Filtered Private Aggregation on Public Data
In aggregation queries, predicate parameters often reveal user intent. Protecting these parameters is critical for user privacy, regardless of whether the database is public or private. While most existing works focus on private data settings, we address a public data setting where the server has access to the database. Current solutions for this setting either require additional setups (e.g., noncolluding servers, hardware enclaves) or are inefficient for practical workloads. Furthermore, they often do not support range predicates or boolean combinations commonly seen in real-world use cases.
To address these limitations, we built HADES, a fully homomorphic encryption (FHE) based private aggregation system for public data that supports point, range predicates, and boolean combinations. Our one-round HADES protocol efficiently generates predicate indicators by leveraging the plaintext form of public data records. It introduces a novel elementwise-mapping operation and an optimized reduction algorithm, achieving latency efficiency within a limited noise budget. Our highly scalable, multi-threaded implementation improves performance over previous one-round FHE solutions by 204x to 6574x on end-to-end TPC-H queries, reducing aggregation time on 1M records from 15 hours to 38 seconds
Computational Analysis of Plausibly Post-Quantum-Secure Recursive Arguments of Knowledge
With the recent standardization of post-quantum cryptographic algorithms, research efforts have largely remained centered on public key exchange and encryption schemes. Argument systems, which allow a party to efficiently argue the correctness of a computation, have received comparatively little attention regarding their quantum-resilient design. These computational integrity frameworks often rely on cryptographic assumptions, such as pairings or group operations, which are vulnerable to quantum attacks. In this work, we present a fully implemented post-quantum secure argument system that compresses unbounded computation into a constant-sized space. We present a fully implemented prover which can argue the truth of any size computation, and verifier which can verify correctness in constant time. This work shows an extension of utility for computational integrity statements into the quantum domain. We provide real-world performance metrics demonstrating that post-quantum secure argument systems not only exist but can outperform classical systems in both efficiency and scalability, making such systems an attractive choice for practical deployment.
On pairing-friendly 2-cycles and SNARK-friendly 2-chains of elliptic curves containing a curve from a prime-order family
Cryptographic protocols such as zkSNARKs use 2-cycles of elliptic curves for efficiency, often relying on pairing computations. However, 2-cycles of pairing-friendly curves are hard to find, and the only known cases consist of an MNT4 and an MNT6 curve. In this work, we prove that a 2-cycle containing an MNT3 curve cannot be pairing-friendly. For other curve families, we have a similar result for cryptographically attractive field sizes. Thus we cannot hope to find new pairing-friendly 2-cycles using the current methods.
Furthermore, we show that there are no SNARK-friendly 2-chains of elliptic curves from combinations of MNT, Freeman and BN curves of reasonable size, except for the (MNT4, MNT6) chains.
Revisiting the Robustness of {(R/M)LWR} under Polynomial Moduli and its Applications
This work conducts a comprehensive investigation on determining the entropic hardness of (R/M)LWR under polynomial modulus. Particularly, we establish the hardness of (M)LWR for general entropic secret distributions from (Module) LWE assumptions based on a new conceptually simple framework called rounding lossiness. By combining this hardness result and a trapdoor inversion algorithm with asymptotically the most compact parameters, we obtain a compact lossy trapdoor function (LTF) with improved efficiency. Extending our LTF with other techniques, we can derive a compact all-but-many LTF and PKE scheme against selective opening and chosen ciphertext attacks, solely based on (Module) LWE assumptions within a polynomial modulus. Additionally, we show a search-to-decision reduction for RLWR with Gaussian secrets from a new R\'enyi Divergence-based analysis.
Discrete Gaussians Modulo Sub-Lattices: New Leftover Hash Lemmas for Discrete Gaussians
The Leftover Hash Lemma (LHL) is a powerful tool for extracting randomness from an entropic distribution, with numerous applications in cryptography. LHLs for discrete Gaussians have been explored in both integer settings by Gentry et al. (GPV, STOC'08) and algebraic ring settings by Lyubashevsky et al. (LPR, Eurocrypt'13). However, the existing LHLs for discrete Gaussians have two main limitations: they require the Gaussian parameter to be larger than certain smoothing parameters, and they cannot handle cases where fixed and arbitrary information is leaked.
In this work, we present new LHLs for discrete Gaussians in both integer and ring settings. Our results show that the Gaussian parameter can be improved by a factor of $\omega(\sqrt{\log\lambda}) $ and $ O(\sqrt{N}) $ compared to the regularity lemmas of GPV and LPR, respectively, under similar parameter choices such as the dimension and ring. Furthermore, our new LHLs can be applied to leaked discrete Gaussians, and the result can be used to establish asymptotic hardness of the extended MLWE assumptions, addressing an open question in recent works by Lyubashevsky et al. (LNP, Crypto'22). Our central techniques involve new fine-grained analyses of the min-entropy in discrete Gaussians modulo sublattices and should be of interest.
Full Key-Recovery Cubic-Time Template Attack on Classic McEliece Decapsulation
Classic McEliece is one of the three code-based candidates in the fourth round of the NIST post-quantum cryptography standardization process in the Key Encapsulation Mechanism category. As such, its decapsulation algorithm is used to recover the session key associated with a ciphertext using the private key. In this article, we propose a new side-channel attack on the syndrome computation in the decapsulation algorithm that recovers the private key, which consists of the private Goppa polynomial $g$ and the permuted support $\mathcal{L}$. The attack relies on both practical aspects and theoretical contributions, namely that the side-channel distinguisher can accurately discriminate elements of the permuted support $\mathcal{L}$, while relying only on a standard noisy Hamming weight leakage assumption and that there exists a cubic-time algorithm that uses this information to recover the private Goppa polynomial $g$. Compared with previous work targeting the Classic McEliece private key, this drastically improves both on the assumptions made in the attacker model and on the overall efficiency of the key-recovery algorithm. We have carried out the attack in practice on a microcontroller target running the reference implementation of Classic McEliece, and make the full attack source code available.
A notion on S-boxes for a partial resistance to some integral attacks
In two recent papers, we introduced and studied the notion of $k$th-order sum-freedom of a vectorial function $F:\mathbb F_2^n\to \mathbb F_2^m$. This notion generalizes that of almost perfect nonlinearity (which corresponds to $k=2$) and has some relation with the resistance to integral attacks of those block ciphers using $F$ as a substitution box (S-box), by preventing the propagation of the division property of $k$-dimensional affine spaces. In the present paper, we show that this notion, which is rarely satisfied by vectorial functions, can be weakened while retaining the property that the S-boxes do not propagate the division property of $k$-dimensional affine spaces. This leads us to the property that we name $k$th-order $t$-degree-sum-freedom, whose strength decreases when $t$ increases, and which coincides with $k$th-order sum-freedom when $t=1$. The condition for $k$th-order $t$-degree-sum-freedom is that, for every $k$-dimensional affine space $A$, there exists a non-negative integer $j$ of 2-weight at most $t$ such that $\sum_{x\in A}(F(x))^j\neq 0$. We show, for a general $k$th-order $t$-degree-sum-free function $F$, that $t$ can always be taken smaller than or equal to $\min(k,m)$ under some reasonable condition on $F$, and that it is larger than or equal to $\frac k{\deg(F)}$, where $\deg(F)$ is the algebraic degree of $F$. We also show two other lower bounds: one, that is often tighter, by means of the algebraic degree of the compositional inverse of $F$ when $F$ is a permutation, and another (valid for every vectorial function) by means of the algebraic degree of the indicator of the graph of the function. We study examples for $k=2$ (case in which $t=1$ corresponds to APNness) showing that finding $j$ of 2-weight 2 can be challenging, and we begin the study of power functions, for which we prove upper bounds. We study in particular the multiplicative inverse function (used as an S-box in the AES), for which we characterize the $k$th-order $t$-degree-sum-freedom by the coefficients of the subspace polynomials of $k$-dimensional vector subspaces (deducing the exact value of $t$ when $k$ divides $n$) and we extend to $k$th-order $t$-degree-sum-freedom the result that it is $k$th-order sum-free if and only if it is $(n-k)$th-order sum-free.
On the practicality of quantum sieving algorithms for the shortest vector problem
One of the main candidates of post-quantum cryptography is lattice-based cryptography. Its cryptographic security against quantum attackers is based on the worst-case hardness of lattice problems like the shortest vector problem (SVP), which asks to find the shortest non-zero vector in an integer lattice. Asymptotic quantum speedups for solving SVP are known and rely on Grover's search. However, to assess the security of lattice-based cryptography against these Grover-like quantum speedups, it is necessary to carry out a precise resource estimation beyond asymptotic scalings. In this work, we perform a careful analysis on the resources required to implement several sieving algorithms aided by Grover's search for dimensions of cryptographic interests. For such, we take into account fixed-point quantum arithmetic operations, non-asymptotic Grover's search, the cost of using quantum random access memory (QRAM), different physical architectures, and quantum error correction. We find that even under very optimistic assumptions like circuit-level noise of $10^{-5}$, code cycles of 100 ns, reaction time of 1 $\mu$s, and using state-of-the-art arithmetic circuits and quantum error-correction protocols, the best sieving algorithms require $\approx 10^{13}$ physical qubits and $\approx 10^{31}$ years to solve SVP on a lattice of dimension 400, which is roughly the dimension for minimally secure post-quantum cryptographic standards currently being proposed by NIST. We estimate that a 6-GHz-clock-rate single-core classical computer would take roughly the same amount of time to solve the same problem. We conclude that there is currently little to no quantum speedup in the dimensions of cryptographic interest and the possibility of realising a considerable quantum speedup using quantum sieving algorithms would require significant breakthroughs in theoretical protocols and hardware development.
A Framework for Group Action-Based Multi-Signatures and Applications to LESS, MEDS, and ALTEQ
A multi-signature scheme allows a list of signers to sign a common message. They are widely used in scenarios where the same message must be signed and transmitted by $N$ users, and, instead of concatenating $N$ individual signatures, employing a multi-signature can reduce the data to be sent.
In recent years there have been numerous practical proposals in the discrete logarithm setting, such as MuSig2 (CRYPTO'21) for the Schnorr signature. Recently, these attempts have been extended to post-quantum assumptions, with lattice-based proposals such as MuSig-L (CRYPTO'22).
Given the growth of group action-based signatures, a natural question is whether a multi-signature can be built on the same models. In this work, we present the first construction of such a primitive relying on group action assumptions. We obtain a 3-round scheme achieving concurrent security in the ROM.
Moreover, we instantiate it using the three candidates to the additional post-quantum NIST's call, namely LESS, MEDS and ALTEQ, obtaining a good compression rate for different parameters sets.
A Note on Security Definitions for Secret Sharing with Certified Deletion
Bartusek and Raizes (CRYPTO 2024) proposed two security definitions for secret sharing, no-signaling certified deletion and adaptive certified deletion. We prove that adaptive certified deletion does not imply no-signaling certified deletion.
Homomorphic Encryption with Authority
Fully homomorphic encryption enables computations over encrypted data, which allows privacy-preserving services to be held between a server and a client. However, real-world applications demand practical considerations, especially concerning public safety and legal investigations. Existing FHE schemes focus solely on privacy, neglecting the societal risks posed by criminal activities utilizing privacy-preserving services. This paper introduces Homomorphic Encryption with Authority (HEwA), a novel framework that balances data privacy with public safety by incorporating an "authority" party. The proposed HEwA system operates in two phases: a normal phase, where client data privacy is protected, and an investigative phase, where the authority referring to a legally authorized entity such as government agencies exerts the right to recover suspicious client’s data. We formalize the security model for HEwA, ensuring that client privacy is protected during the normal phase while enabling authorities to recover encrypted data in the investigative phase. As a concrete example, we design an efficient HEwA system solely based on the CKKS homomorphic encryption scheme, which supports approximate computations over real-number data, making it highly suitable for fruitful applications in AI such as secure genomic analysis. We further provide rigorous security proofs. This new approach addresses the tension between privacy and public safety in cloud services, paving the way for responsible use of homomorphic encryption in practice.
Revisiting Products of the Form $X$ Times a Linearized Polynomial $L(X)$
For a $q$-polynomial $L$ over a finite field $\mathbb{F}_{q^n}$, we characterize the differential spectrum of the function $f_L\colon \mathbb{F}_{q^n} \rightarrow \mathbb{F}_{q^n}, x \mapsto x \cdot L(x)$ and show that, for $n \leq 5$, it is completely determined by the image of the rational function $r_L \colon \mathbb{F}_{q^n}^* \rightarrow \mathbb{F}_{q^n}, x \mapsto L(x)/x$. This result follows from the classification of the pairs $(L,M)$ of $q$-polynomials in $\mathbb{F}_{q^n}[X]$, $n \leq 5$, for which $r_L$ and $r_M$ have the same image, obtained in [B. Csajbok, G. Marino, and O. Polverino. A Carlitz type result for linearized
polynomials. Ars Math. Contemp., 16(2):585–608, 2019]. For the case of $n>5$, we pose an open question on the dimensions of the kernels of $x \mapsto L(x) - ax$ for $a \in \mathbb{F}_{q^n}$.
We further present a link between functions $f_L$ of differential uniformity bounded above by $q$ and scattered $q$-polynomials and show that, for odd values of $q$, we can construct CCZ-inequivalent functions $f_M$ with bounded differential uniformity from a given function $f_L$ fulfilling certain properties.
Revocable Encryption, Programs, and More: The Case of Multi-Copy Security
Fundamental principles of quantum mechanics have inspired many new research directions, particularly in quantum cryptography. One such principle is quantum no-cloning which has led to the emerging field of revocable cryptography. Roughly speaking, in a revocable cryptographic primitive, a cryptographic object (such as a ciphertext or program) is represented as a quantum state in such a way that surrendering it effectively translates into losing the capability to use this cryptographic object. All of the revocable cryptographic systems studied so far have a major drawback: the recipient only receives one copy of the quantum state. Worse yet, the schemes become completely insecure if the recipient receives many identical copies of the same quantum state---a property that is clearly much more desirable in practice.
While multi-copy security has been extensively studied for a number of other quantum cryptographic primitives, it has so far received only little treatment in context of unclonable primitives. Our work, for the first time, shows the feasibility of revocable primitives, such as revocable encryption and revocable programs, which satisfy multi-copy security in oracle models. This suggest that the stronger notion of multi-copy security is within reach in unclonable cryptography more generally, and therefore could lead to a new research
direction in the field.
Circular Insecure Encryption: from Long Cycles to Short Cycles
We prove that the existence of a CPA-secure encryption scheme that is insecure in the presence of key cycles of length $n$ implies the existence of such a scheme for key cycles of any length less than $n$. Equivalently, if every encryption scheme in a class is $n$-circular secure and this class is closed under our construction, then every encryption scheme in this class is $n'$-circular secure for $n' > n$.
GAPP: Generic Aggregation of Polynomial Protocols
We propose a generic framework called GAPP for aggregation of polynomial protocols. This allows proving $n$ instances of a polynomial protocol using a single aggregate proof that has $O(\log n)$ size, and can be verified using $O(\log^2 n)$ operations. The satisfiability of several univariate polynomial identities over a domain is reduced to the satisfiability of a single bivariate polynomial identity over a related domain, where the bivariate polynomials interpolate a batch of univariate polynomials over the domain. We construct an information-theoretic protocol for proving the satisfiability of the bivariate polynomial identity, which is then compiled using any bivariate polynomial commitment scheme (PCS) to yield an argument of knowledge for the aggregation relation. GAPP can be applied to several popular SNARKs over bilinear groups that are modeled as polynomial protocols in a black-box way.
We present a new bivariate polynomial commitment scheme, bPCLB, with succinct verification that yields an efficient instantiation of GAPP. In addition, the prover only performs sublinear cryptographic operations in the evaluation proof. Towards constructing bPCLB, we show a new folding technique that we call Lagrangian folding. The bivariate PCS bPCLB and the Lagrangian folding scheme are of independent interest. We implement bPCLB and experimentally validate the practical efficiency of our GAPP instantiation. For the popular PLONK proof system, we achieve $25$-$30\%$ faster proof generation than the naive baseline of generating $n$ separate PLONK proofs. Compared to all existing aggregation schemes that incur additional prover overheads on top of the baseline, we achieve significantly more efficient proving, while retaining succinct verification.
We demonstrate the versatility of our GAPP framework by outlining applications of practical interest: tuple lookups that significantly outperform existing lookup arguments in terms of prover overheads; and proofs for non-uniform computation with a la carte prover cost.
Blind zkSNARKs for Private Proof Delegation and Verifiable Computation over Encrypted Data
In this paper, we show for the first time it is practical to privately delegate proof generation of zkSNARKs to a single server for computations of up to $2^{20}$ R1CS constraints. We achieve this by computing zkSNARK proof generation over homomorphic ciphertexts, an approach we call blind zkSNARKs. We formalize the concept of blind proofs, analyze their cryptographic properties and show that the resulting blind zkSNARKs remain sound when compiled using BCS compilation. Our work follows the framework proposed by Garg et al. (Crypto’24) and improves the instantiation presented by Aranha et al. (Asiacrypt’24), which implements only the FRI subprotocol. By delegating proof generation, we are able to reduce client computation time from 10 minutes to mere seconds, while server computation time remains limited to 20 minutes. We also propose a practical construction for vCOED supporting constraint sizes four orders of magnitude larger than the current stateof-the-art verifiable FHE-based approaches. These results are achieved by optimizing Fractal for the GBFV homomorphic encryption scheme, including a novel method for making homomorphic NTT evaluation packing-friendly by computing it in two dimensions. Furthermore, we make the proofs publicly verifiable by appending a zero-knowledge Proof of Decryption (PoD). We propose a new construction for PoDs optimized for low proof generation time, exploiting modulus and ring switching in GBFV and using the Schwartz-Zippel lemma for proof batching; these techniques might be of independent interest. Finally, we implement the latter protocol in C and report on execution time and proof sizes.
Unclonable Functional Encryption
In a functional encryption (FE) scheme, a user that holds a ciphertext and a function-key can learn the result of applying the function to the plaintext message. Security requires that the user does not learn anything beyond the function evaluation. We extend this notion to the quantum setting by providing definitions and a construction for a quantum functional encryption (QFE) scheme which allows for the evaluation of polynomialy-sized circuits on arbitrary quantum messages. Our construction is built upon quantum garbled circuits [BY22].
We also investigate the relationship of QFE to the seemingly unrelated notion of unclonable encryption (UE) and find that any QFE scheme universally achieves the property of unclonable functional encryption (UFE). In particular we assume the existence of an unclonable encryption scheme with quantum decryption keys which was recently constructed by [AKY24]. Our UFE guarantees that two parties cannot simultaneously recover the correct function outputs using two independently sampled function secret keys. As an application we give the first construction for public-key UE with variable decryption keys.
Lastly, we establish a connection between quantum indistinguishability obfuscation (qiO) and quantum functional encryption (QFE); Showing that any multi-input indistinguishability-secure quantum functional encryption scheme unconditionally implies the existence of qiO.
Toward Optimal-Complexity Hash-Based Asynchronous MVBA with Optimal Resilience
Uncategorized
Uncategorized
Multi-valued validated Byzantine agreement (MVBA), a fundamental primitive of distributed computing, allows $n$ processes to agree on a valid $\ell$-bit value, despite $t$ faulty processes behaving maliciously. Among hash-based solutions for the asynchronous setting with adaptive faults, the state-of-the-art HMVBA protocol achieves optimal $O(n^2)$ message complexity, (near-)optimal $O(n \ell + n^2 \lambda \log n)$ bit complexity, and optimal $O(1)$ time complexity. However, it only tolerates up to $t < \frac15 n$ adaptive failures. In contrast, the best known optimally resilient protocol, FIN-MVBA, exchanges $O(n^3)$ messages and $O(n^2 \ell + n^3 \lambda)$ bits. This highlights a fundamental question: can a hash-based protocol be designed for the asynchronous setting with adaptive faults that simultaneously achieves both optimal complexity and optimal resilience?
In this paper, we take a significant step toward answering the question. Namely, we introduce Reducer, an MVBA protocol that retains HMVBA's complexity while improving its resilience to $t < \frac14 n$. Like HMVBA and FIN-MVBA, Reducer relies exclusively on collision-resistant hash functions. A key innovation in Reducer's design is its internal use of strong multi-valued Byzantine agreement (SMBA), a variant of strong consensus we introduce and construct, which ensures agreement on a correct process's proposal. To further advance resilience toward the optimal one-third bound, we then propose Reducer++, an MVBA protocol that tolerates up to $t < (\frac13 - \epsilon)n$ adaptive failures, for any fixed constant $\epsilon > 0$. Unlike Reducer, Reducer++ does not rely on SMBA. Instead, it employs a novel approach involving hash functions modeled as random oracles to ensure termination. Reducer++ maintains constant time complexity, quadratic message complexity, and quasi-quadratic bit complexity, with constants dependent on $\epsilon$.
Another L makes it better? Lagrange meets LLL and may improve BKZ pre-processing
Uncategorized
Uncategorized
We present a new variant of the LLL lattice reduction algorithm, inspired by Lagrange notion of pair-wise reduction, called L4. Similar to LLL, our algorithm is polynomial in the dimension of the input lattice, as well as in $\log M$, where $M$ is an upper-bound on the norm of the longest vector of the input basis.
We experimentally compared the norm of the first basis vector obtained with LLL and L4 up to dimension 200. On average we obtain vectors that are up to $16\%$ shorter. We also used our algorithm as a pre-processing step for the BKZ lattice reduction algorithm with blocksize 24. In practice, up to dimension 140, this allows us to reduce the norm of the shortest basis vector on average by $3\%$, while the runtime does not
significantly increases. In $10\%$ of our tests, the whole process was even faster.
Sunfish: Reading Ledgers with Sparse Nodes
The increased throughput offered by modern blockchains, such as Sui, Aptos, and Solana, enables processing thousands of transactions per second, but it also introduces higher costs for decentralized application (dApp) developers who need to track and verify changes in the state of their application. This is true because dApp developers run full nodes, which download and re-execute every transaction to track the global state of the chain. However, this becomes prohibitively expensive for high-throughput chains due to high bandwidth, computational, and storage requirements. A common alternative is to use light nodes. However, light nodes only verify the inclusion of a set of transactions and have no guarantees that the set is complete, i.e., that includes all relevant transactions. Under a dishonest majority, light nodes can also be tricked into accepting invalid transactions.
To bridge the gap between full and light nodes, we propose and formalize a new type of blockchain node: the sparse node. A sparse node tracks only a subset of the blockchain’s state: it verifies that the received set of transactions touching the substate is complete, and re-executes those transactions to assess their validity. A sparse node retains important security properties even under adversarial majorities, and requires an amount of resources proportional to the number of transactions in the substate and to the size of the substate itself.
We further present Sunfish, an instantiation of a sparse node protocol. Our analysis and evaluation show that Sunfish reduces the bandwidth consumption and, in turn, the computational and storage resources, of real blockchain applications by several orders of magnitude when compared to a full node.
Information Set Decoding for Ring-Linear Code
Information set decoding (ISD) algorithms currently offer the most powerful tool to solve the two archetypal problems of coding theory, namely the Codeword Finding Problem and the Syndrome Decoding Problem. Traditionally, ISD have primarily been studied for linear codes over finite fields, equipped with the Hamming metric.
However, recently, other possibilities have also been explored. These algorithms have been adapted to different ambient spaces and metrics, such as the rank metric or the Lee metric over $\mathbb Z_m$.
In this paper, we show that it is possible to leverage the ring structure to construct more efficient decoding algorithms than those obtained by simply adapting ISD. In particular, we describe a framework that can be applied to any additive metric including Hamming and Lee, and that can be adapted to the case of the rank metric, providing algorithms to solve the two aforementioned problems, along with their average computational costs.
Commutative Cryptanalysis as a Generalization of Differential Cryptanalysis
Recently, Baudrin et al. analyzed a special case of Wagner's commutative diagram cryptanalysis, referred to as commutative cryptanalysis. For a family $(E_k)_k$ of permutations on a finite vector space $G$, commutative cryptanalysis exploits the existence of affine permutations $A,B \colon G \rightarrow G$, $I \notin \{A,B\}$ such that $E_k \circ A (x) = B \circ E_k(x)$ holds with high probability, taken over inputs $x$, for a significantly large set of weak keys $k$. Several attacks against symmetric cryptographic primitives can be formulated within the framework of commutative cryptanalysis, most importantly differential attacks, as well as rotational and rotational-differential attacks. Besides, the notion of $c$-differentials on S-boxes can be analyzed as a special case within this framework.
We discuss the relations between a general notion of commutative cryptanalysis, with $A$ and $B$ being arbitrary functions over a finite Abelian group, and differential cryptanalysis, both from the view of conducting an attack on a symmetric cryptographic primitive, as well as from the view of a theoretical study of cryptographic S-boxes.
Batch Range Proof: How to Make Threshold ECDSA More Efficient
With the demand of cryptocurrencies, threshold ECDSA recently regained popularity. So far, several methods have been proposed to construct threshold ECDSA, including the usage of OT and homomorphic encryptions (HE). Due to the mismatch between the plaintext space and the signature space, HE-based threshold ECDSA always requires zero-knowledge range proofs, such as Paillier and Joye-Libert (JL) encryptions. However, the overhead of range proofs constitutes a major portion of the total cost.
In this paper, we propose efficient batch range proofs to improve the efficiency of threshold ECDSA. At the heart of our efficiency improvement is a new technical tool called Multi-Dimension Forking Lemma, as a generalization of the well-known general forking lemma [Bellare and Neven, CCS 2006]. Based on our new tool, we construct efficient batch range proofs for Paillier and JL encryptions, and use them to give batch multiplication-to-addition (MtA) protocols, which are crucial to most threshold ECDSA. Our constructions improve the prior Paillier-based MtA by a factor of 2 and the prior JL-based MtA by a factor of 3, in both computation and bandwidth in an amortized way. Our batch MtA can be used to improve the efficiency of most Paillier and JL based threshold ECDSA. As three typical examples, our benchmarking results show:
-- We improve the Paillier-based CGGMP20 [Canetti et al., CCS 2020] in bandwidth by a factor of 2.1 to 2.4, and in computation by a factor of 1.5 to 1.7.
-- By implementing threshold ECDSA with the batch JL MtA of XAL+23 [Xue et al., CCS 2023] and our batch JL MtA, respectively, our batch construction improves theirs in bandwidth by a factor of 2.0 to 2.29, and in computation by a factor of 1.88 to 2.09.
-- When replacing OT-based MtA in DKLs24 [Doerner et al., S$\&$P 2024] with our Paillier-based batch MtA, we improve the bandwidth efficiency by $7.8\times$ at the cost of $5.7\times$ slower computation.
The Sting Framework: Proving the Existence of Superclass Adversaries
Uncategorized
Uncategorized
We introduce superclass accountability, a new notion of accountability for security protocols. Classical notions of accountability typically aim to identify specific adversarial players whose violation of adversarial assumptions has caused a security failure. Superclass accountability describes a different goal: to prove the existence of adversaries capable of violating security assumptions.
We develop a protocol design approach for realizing superclass accountability called the sting framework (SF). Unlike classical accountability, SF can be used for a broad range of applications without making protocol modifications and even when security failures aren’t attributable to particular players.
SF generates proofs of existence for superclass adversaries that are publicly verifiable, making SF a promising springboard for reporting by whistleblowers, high-trust bug-bounty programs, and so forth.
We describe how to use SF to prove the existence of adversaries capable of breaching the confidentiality of practical applications that include Tor, block-building infrastructure in web3, ad auctions, and private contact discovery---as well as the integrity of fair-transaction-ordering systems. We report on two end-to-end SF systems we have constructed---for Tor and block-building---and on experiments with those systems.
Testing Robustness of Homomorphically Encrypted Split Model LLMs
Large language models (LLMs) have recently transformed many industries, enhancing content generation, customer service agents, data analysis and even software generation. These applications are often hosted on remote servers to protect the neural-network model IP; however, this raises concerns about the privacy of input queries. Fully Homomorphic Encryption (FHE), an encryption technique that allows for computations on private data, has been proposed as a solution to the challenge. Nevertheless, due to the increased size of LLMs and the computational overheads of FHE, today's practical FHE LLMs are implemented using a split model approach. Here, a user sends their FHE encrypted data to the server to run an encrypted attention head layer; then the server returns the result of the layer for the user to run the rest of the model locally. By employing this method, the server maintains part of their model IP, and the user still gets to perform private LLM inference. In this work, we evaluate the neural-network model IP protections of single layer split model LLMs, and demonstrate a novel attack vector that makes it easy for a user to extract the neural network model IP from the server, bypassing the claimed protections for encrypted computation. In our analysis, we demonstrate the feasibility of this attack, and discuss potential mitigations.
Provable Security Analysis of Butterfly Key Mechanism Protocol in IEEE 1609.2.1 Standard
The paper provides the first provable security analysis of the Butterfly Key Mechanism (BKM) protocol from IEEE 1609.2.1 standard. The BKM protocol specifies a novel approach for efficiently requesting multiple certificates for use in vehicle-to-everything (V2X) communication. We define the main security goals of BKM, such as vehicle privacy and communication authenticity. We prove that the BKM protocol, with small modifications, meets those security goals. We also propose a way to significantly improve the protocol's efficiency without sacrificing security.
Proteus: A Fully Homomorphic Authenticated Transciphering Protocol
Fully Homomorphic Encryption (FHE) is a powerful technology that allows a cloud server to perform computations directly on ciphertexts. To overcome the overhead of sending and storing large FHE ciphertexts, the concept of FHE transciphering was introduced, allowing symmetric key encrypted ciphertexts to be transformed into FHE ciphertexts by deploying symmetric key decryption homomorphically. However, existing FHE transciphering schemes remain unauthenticated and malleable, allowing attackers to manipulate data and remain undetected. This work introduces Proteus, a new methodology for authenticated transciphering, which enables oblivious access control, preventing users from downloading unauthenticated or malicious data. Our protocol implementation adopts ASCON, NIST's new standard for lightweight cryptography, to enable homomorphic hashing and authenticated transciphering. Our ASCON transcipher is paired with the TFHE encryption scheme, which is well suited to perform encrypted rotation and bitwise operations. We evaluate our approach with a variety of real-life privacy-preserving applications, including URL phishing detection, private content moderation of hate speech, and biometric authentication.
New Strategies for Bootstrapping Large-Error Ciphertext in Large-Precision FHEW/TFHE Cryptosystem
Bootstrapping is the core task in fully homomorphic encryption. It is designed to self-clean encrypted data to support unlimited level of homomorphic computing. FHEW/TFHE cryptosystem provides the fastest bootstrapping machinery in addition to the unique homomorphic evaluation functionality. In 2021, the problem of large-precision bootstrapping was investigated in the literature, with fast algorithms proposed and implemented. A common strategy to all the algorithms is to decompose the plaintext homomorphically into blocks from the tail up, at the same bootstrap the blocks sequentially.
This paper proposes two new strategies to improve the efficiency of large-precision plaintext bootstrapping. Both strategies are based on a new design of continuous nega-cyclic function with varying resolution, for making accurate computation with blockwise approximate computing. To minimize the approximation error in each block, optimizations are proposed based on rigorous error estimation, and are illustrated by error bounds in power-of-two binomial representation.
The first strategy is to make homomorphic approximate decomposition of the input plaintext from the head on. Compared with the tail-up approach, the head-on approach reduces the number of blocks at most by half asympotitically, at the same time reducing the final refreshed error by at most $1-1/\sqrt{2}\approx 29.3\%$.
The second strategy extends the head-on approach from large-precision plaintext bootstrapping to large error reduction. It can be used directly to the input ciphertext for the purpose of plaintext bootstrapping; it can also be used after plaintext bootstrapping to further reduce the refreshed error.
Two algorithms based on the above two strategies are proposed, together with some variants combining the tail-up approach. The tail-up approach is completely re-developed for optimal blocksize control based on careful error analysis, and a corresponding algorithm is proposed. All the algorithms are implemented on PALISADE, and experiments based on real data show that the by the new strategies, the speed of large-precision plaintext bootstrapping can be improved to as many as 7 times.
Multi-party Setup Ceremony for Generating Tokamak zk-SNARK Parameters
This document provides a specification guide for the Multi-party Computation (MPC) setup ceremony for the Tokamak zk-SNARK scheme. It begins by revisiting the MMORPG protocol proposed in BGM17 for Groth16 setup generation, which leverages a random beacon to ensure public randomness. Additionally, it explores the alternative design approach presented in the ``Snarky Ceremonies" paper KMSV21, which removes the need for a random beacon. The document includes a detailed pseudocode and workflow for each stage of parameter generation in the Tokamak zk-SNARK protocol.
Tokamak zk-SNARK employs a universal setup through sub-circuits, which allows for CRS reuse across multiple circuits. This approach reduces the need for repeated trusted setups and emphasizes efficiency in verifier preprocessing. The document also introduces pseudocodes for various types of parameter generation during the MPC setup. This includes the generation of parameters like Powers of $\tau$, circuit-specific parameters, and different types of mappings across both the random beacon and non-random beacon based approaches. These pseudocodes ensure clarity in the protocol's step-by-step process, from the computation of shared parameters to verifying correctness.
Finally, the document presents a sketch security analysis of both protocols, relying on the Algebraic Group Model (AGM) and the Random Oracle Model (ROM) to prove knowledge soundness and security of the generated CRS. The analysis considers potential attacks and demonstrates that, even without a random beacon, the setup remains secure under the assumptions of these models.
Statistical Layered MPC
The seminal work of Rabin and Ben-Or (STOC'89) showed that the problem of secure $n$-party computation can be solved for $t<n/2$ corruptions with guaranteed output delivery and statistical security. This holds in the traditional static model where the set of parties is fixed throughout the entire protocol execution.
The need to better capture the dynamics of large scale and long-lived computations, where compromised parties may recover and the set of parties can change over time, has sparked renewed interest in the proactive security model by Ostrovsky and Yung (PODC'91). This abstraction, where the adversary may periodically uncorrupt and corrupt a new set of parties, is taken even a step further in the more recent YOSO and Fluid MPC models (CRYPTO'21) which allow, in addition, disjoint sets of parties participating in each round. Previous solutions with guaranteed output delivery and statistical security only tolerate $t<n/3$ corruptions, or assume a random corruption pattern plus non-standard communication models.
Matching the Rabin and Ben-Or bound in these settings remains an open problem.
In this work, we settle this question considering the unifying Layered MPC abstraction recently introduced by David et al. (CRYPTO'23). In this model, the interaction pattern is defined by a layered acyclic graph, where each party sends secret messages and broadcast messages only to parties in the very next layer. We complete the feasibility landscape of layered MPC, by extending the Rabin and Ben-Or result to this setting. Our results imply maximally-proactive MPC with statistical security in the honest-majority setting.
The Role of Message-Bound Signatures for the Beyond UnForgeability Features and Weak Keys
In the present work, we establish a new relationship among the Beyond UnForgeability Features (BUFF) introduced by Cremers et al. (SP’21). There, the BUFF notions have been shown to be independent of one another. On the other hand, the analysis by Aulbach et al. (PQCrypto’24) reveals that one of the BUFF notions—message-bound signatures (MBS)—is achieved by most schemes. To achieve BUFF security, there is the generic BUFF transform that achieves all the beyond unforgeability features. The BUFF transform works by signing a hash of the public key and the message (rather than just the message), and appending this hash value to the signature. The need for appending the hash comes from the intuitive notion of weak keys that verify all message-signature pairs. We explain that MBS security effectively rules out the possibility of weak keys. This opens the possibility for a more efficient transform to achieve BUFF. We show that this transform, first introduced by Pornin and Stern (ACNS’05), indeed suffices to achieve BUFF security, if the original signature schemes satisfies MBS. Only in the malicious setting of exclusive ownership, we present an attack on UOV, even after applying the PS-3 transform.
Modelings for generic PoK and Applications: Shorter SD and PKP based Signatures
The Multi-Party Computation in the Head (MPCitH) paradigm has proven to be a versatile tool to design proofs of knowledge (PoK) based on variety of computationally hard problems. For instance, many post-quantum signatures have been designed from MPC based proofs combined with the Fiat-Shamir transformation. Over the years, MPCitH has evolved significantly with developments based on techniques such as threshold computing and other optimizations. Recently, Vector Oblivious Linear Evaluation (VOLE) and the VOLE in the Head framework has spurred further advances. In this work, we introduce three VOLE-friendly modelings for generic and communication efficient PoK to prove the knowledge of secret witness in the form of elementary vectors, vectors of Hamming weight at most $\omega$, and permutation matrices. Remarkably, these modelings scale logarithmically with respect to the original witness sizes. Specifically, our modeling for elementary vectors of size $n$ transforms the witness size to $\mathcal{O}(\log_2(n))$, in case of vectors of size $n$ and Hamming weight at most $\omega$ the reduced witness is of size $\mathcal{O}\left(\omega \log_2(n)\right)$ while our modeling for permutation matrix of size $n \times n$ results in an (equivalent) witness of size $\mathcal{O}(n\log_2(n))$, which leads to small proofs in practice. To achieve this, we consider modelings with higher multiplicative depth $d > 2$. Even if this increases the proof size, we show that our approach compares favorably with existing proofs. Such design choices were mostly overlooked in previous comparable works, maybe because prior to the VOLEitH framework, multiplications were often emulated with Beaver's triples causing the proof size to scale poorly with $d$. We also provide several applications for our modelings namely i) post-quantum signature schemes based on the $\mathsf{SD}$ (Syndrome Decoding) problem and $\mathsf{PKP}$ (Permuted Kernel Problem), ii) PoK of secrets key for code-based key encapsulation mechanism (KEM), and iii) ring signatures from $\mathsf{SD}$ and $\mathsf{PKP}$. Our signatures based on $\mathsf{SD}$ over $\mathbb{F}_2$ and $\mathsf{PKP}$ feature sizes of $3.9$ kB and $3.6$ kB for NIST-I security level which is respectively $26\%$ and $38\%$ shorter than state-of-the-art alternatives. Our PoK of secret key of BIKE and HQC are twice shorter than similar PoK for Kyber. In addition, we obtain the smallest ring signature based on $\mathsf{SD}$ and the first ring signature based on $\mathsf{PKP}$.
Overlapped Bootstrapping for FHEW/TFHE and Its Application to SHA3
Homomorphic Encryption (HE) enables operations on encrypted data without requiring decryption, thus allowing for secure handling of confidential data within smart contracts. Among the known HE schemes, FHEW and TFHE are particularly notable for use in smart contracts due to their lightweight nature and support for arbitrary logical gates. In contrast, other HE schemes often require several gigabytes of keys and are limited to supporting only addition and multiplication. As a result, there has been significant work implementing smart contract functionalities over HE, broadening the potential applications of blockchain technology. However, a significant drawback of the FHEW/TFHE schemes is the need for bootstrapping after the execution of each binary gate. While bootstrapping reduces noise in the ciphertext, it also becomes a performance bottleneck due to its computational complexity.
In this work, we propose an efficient new bootstrapping method for FHEW/TFHE that takes advantage of the flexible scaling factors of encrypted data. The proposed method is particularly beneficial in circuits with consecutive XOR gates. Moreover, we implement Keccak using FHEW/TFHE, as it is one of the most important functions in smart contracts. Our experimental results demonstrate that the proposed method reduces the runtime of Keccak over HE by 42%. Additionally, the proposed method does not require additional keys or parameter sets from the key-generating party and can be adopted by the computing party without need for any extra information.
Computationally Efficient Asynchronous MPC with Linear Communication and Low Additive Overhead
We explore the setting of asynchronous multi-party computation (AMPC) with optimal resilience $n=3t+1$, and develop an efficient protocol that optimizes both communication and computation.
The recent work by Goyal, Liu-Zhang, and Song [Crypto' 24] was the first to achieve AMPC with amortized linear communication cost without using computationally heavy public-key cryptography. However, its $\mathcal{O}(n^{14})$ additive communication overhead renders it impractical for most real-world applications.
It is possible to reduce the communication overhead significantly by leveraging cryptographic tools such as %random oracle hash,
homomorphic commitments, public-key cryptography, or zero-knowledge proofs; however, the corresponding AMPC protocols introduce computation overhead of $\Omega(nC)$ public-key cryptographic operations that become bottleneck as $n$ grows. Overall, achieving AMPC with linear communication complexity, low additive communication overhead, and low computation overhead remains an open challenge.
In this work, we resolve this efficiency challenge by utilizing the random oracle model. By relying solely on computationally efficient primitives such as random oracle hash and symmetric-key cryptography, our protocol is not only efficient in terms of computation and communication overhead but also post-quantum secure. For a circuit with $C$ multiplication gates, our protocol achieves $\mathcal{O}(Cn)$ communication per multiplication gate with an additive overhead of $\mathcal{O}(n^4)$ communication. In terms of computation, our protocol only introduces an additive overhead of $\mathcal{O}(n^5)$ hash computations independent of the circuit size.
DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning using Packed Secret Sharing
Federated Learning (FL) has gained lots of traction recently, both in industry and academia. In FL, a machine learning model is trained using data from various end-users arranged in committees across several rounds. Since such data can often be sensitive, a primary challenge in FL is providing privacy while still retaining utility of the model. Differential Privacy (DP) has become the main measure of privacy in the FL setting. DP comes in two flavors: central and local. In the former, a centralized server is trusted to receive the users' raw gradients from a training step, and then perturb their aggregation with some noise before releasing the next version of the model. In the latter (more private) setting, noise is applied on users' local devices, and only the aggregation of users' noisy gradients is revealed even to the server. Great strides have been made in increasing the privacy-utility trade-off in the central DP setting, by utilizing the so-called \emph{matrix mechanism}. However, progress has been mostly stalled in the local DP setting. In this work, we introduce the \emph{distributed} matrix mechanism to achieve the best-of-both-worlds; local DP and also better privacy-utility trade-off from the matrix mechanism. We accomplish this by proposing a cryptographic protocol that securely transfers sensitive values across rounds, which makes use of \emph{packed secret sharing. This protocol accommodates the dynamic participation of users per training round required by FL, including those that may drop out from the computation. We provide experiments which show that our mechanism indeed significantly improves the privacy-utility trade-off of FL models compared to previous local DP mechanisms, with little added overhead.
Consensus on SNARK pre-processed circuit polynomials
This paper addresses verifiable consensus of pre-processed circuit polynomials for succinct non-interactive argument of knowledge (SNARK). More specifically, we focus on parts of circuits, referred to as wire maps, which may change based on program inputs or statements being argued. Preparing commitments to wire maps in advance is essential for certain SNARK protocols to maintain their succinctness, but it can be costly. SNARK verifiers can alternatively consider receiving wire maps from an untrusted parties.
We propose a consensus protocol that reaches consensus on wire maps using a majority rule. The protocol can operate on a distributed, irreversible, and transparent server, such as a blockchain. Our analysis shows that while the protocol requires over 50\% honest participants to remain robust against collusive attacks, it enables consensus on wire maps with a low and fixed verification complexity per communication, even in adversarial settings. The protocol guarantees consensus completion within a time frame ranging from a few hours to several days, depending on the wire map degree and the honest participant proportion.
Technically, our protocol leverages a directed acyclic graph (DAG) structure to represent conflicting wire maps among the untrusted deliverers. Wire maps are decomposed into low-degree polynomials, forming vertices and edges of this DAG. The consensus participants, or deliverers, collaboratively manage this DAG by submitting edges to branches they support. The protocol then returns a commitment to the wire map that is written in the first fully grown branch. The protocol's computational efficiency is derived from an interactive commit-prove-verify scheme that enables efficient validation of submitted edges.
Our analysis implies that the practical provides a practical solution for achieving secure consensus on SNARK wire maps in environments with dynamic proportion of honest participants. Additionally, we introduce a tunable parameter $N$ that allows the protocol to minimize cost and time to consensus while maintaining a desired level of security.
A Hidden-Bits Approach to Statistical ZAPs from LWE
We give a new approach for constructing statistical ZAP arguments (a two-message public-coin statistically witness indistinguishable argument) from quasi-polynomial hardness of the learning with errors (LWE) assumption with a polynomial modulus-to-noise ratio. Previously, all ZAP arguments from lattice-based assumptions relied on correlation-intractable hash functions. In this work, we present the first construction of a ZAP from LWE via the classic hidden-bits paradigm. Our construction matches previous lattice-based schemes by being public-coin and satisfying statistical witness indistinguishability. Moreover, our construction is the first lattice-based ZAP that is fully black-box in the use of cryptography. Previous lattice-based ZAPs based on correlation-intractable hash functions all made non-black-box use of cryptography.
Composability in Watermarking Schemes
Software watermarking allows for embedding a mark into a piece of code, such that any attempt to remove the mark will render the code useless. Provably secure watermarking schemes currently seems limited to programs computing various cryptographic operations, such as evaluating pseudorandom functions (PRFs), signing messages, or decrypting ciphertexts (the latter often going by the name ``traitor tracing''). Moreover, each of these watermarking schemes has an ad-hoc construction of its own.
We observe, however, that many cryptographic objects are used as building blocks in larger protocols. We ask: just as we can compose building blocks to obtain larger protocols, can we compose watermarking schemes for the building blocks to obtain watermarking schemes for the larger protocols? We give an affirmative answer to this question, by precisely formulating a set of requirements that allow for composing watermarking schemes. We use our formulation to derive a number of applications.
zkFFT: Extending Halo2 with Vector Commitments & More
This paper introduces zkFFT, a novel zero-knowledge argument designed to efficiently generate proofs for FFT (Fast Fourier Transform) relations. Our approach enables the verification that one committed vector is the FFT of another, addressing an efficiency need in general-purpose non-interactive zero-knowledge proof systems where the proof relation utilizes vector commitments inputs.
We present a concrete enhancement to the Halo2 proving system, demonstrating how zkFFT optimizes proofs in scenarios where the proof relation includes one or more vector commitments. Specifically, zkFFT incorporates streamlined logic within Halo2 and similar systems, augmenting proof and verification complexity by only $O(\text{log}N)$, where $N$ is the vector size. This represents a substantial improvement over conventional approach, which often necessitates specific circuit extensions to validate the integrity of vector commitments and their corresponding private values in the arithmetic framework of the proof relation. The proposed zkFFT method supports multiple vector commitments with only a logarithmic increase in extension costs, making it highly scalable. This capability is pivotal for practical applications involving multiple pre-committed values within proof statements.
Apart from Halo2, our technique can be adapted to any other zero-knowledge proof system that relies on arithmetization, where each column is treated as an evaluation of a polynomial over a specified domain, computes this polynomial via FFT, and subsequently commits to the resulting polynomial using a polynomial commitment scheme based on inner-product arguments. Along with efficient lookup and permutation arguments, zkFFT will streamline and significantly optimize the generation of zero-knowledge proofs for arbitrary relations.
Beyond the applications in augmenting zero-knowledge proof systems, we believe that the formalized zkFFT argument can be of independent interest.
A Note on the Hint in the Dilithium Digital Signature Scheme
In the Dilithium digital signature scheme, there is an inherent tradeoff between the length of the public key, and the length of the signature. The coefficients of the main part of the public-key, the vector $\mathbf{t}$, are compressed (in a lossy manner), or "quantized", during the key-generation procedure, in order to save on the public-key size. That is, the coefficients are divided by some fixed denominator, and only the quotients are published. However, this results in some "skew" during the verification process, and to fix this, a special signature-dependent "hint" is computed during the signing process. Roughly speaking, stronger compression of $\mathbf{t}$ results in the hint carrying more information, consequently increasing the signature length. Prior to the hint computation, a test is performed to check whether a proper hint can indeed be composed to fix this skew, and if the test fails, the signing process is rerun with a different seed for the (pseudo-)randomness. However, in this short report we observe that this test is not performed optimally: the test calculates a sufficient condition for the hint to work, but not a necessary one. We suggest a new refined test that results in a lower probability for the sign iteration to fail. The new test exhibits some improvement (in terms of expected running time) in certain configurations that are characterized by shorter public-key length on the expense of slightly longer signature length. It is noted that the change does not imply any change in the security of the algorithm.
Instance Compression, Revisited
Collision-resistant hashing (CRH) is a cornerstone of cryptographic protocols. However, despite decades of research, no construction of a CRH based solely on one-way functions has been found. Moreover, there are black-box limitations that separate these two primitives.
Harnik and Naor [HN10] overcame this black-box barrier by introducing the notion of instance compression. Instance compression reduces large NP instances to a size that depends on their witness size while preserving the "correctness" of the instance relative to the language. Shortly thereafter, Fortnow and Santhanam showed that efficient instance compression algorithms are unlikely to exist (as the polynomial hierarchy would collapse). Bronfman and Rothblum defined a computational analog of instance compression, which they called computational instance compression (CIC), and gave a construction of CIC under standard assumptions. Unfortunately, this notion is not strong enough to replace instance compression in Harnik and Naor's CRH construction.
In this work, we revisit the notion of computation instance compression and ask what the "correct" notion for CIC is, in the sense that it is sufficiently strong to achieve useful cryptographic primitives while remaining consistent with common assumptions. First, we give a natural strengthening of the CIC definition that serves as a direct substitute for the instance compression scheme in the Harnik--Naor construction. However, we show that even this notion is unlikely to exist.
We then identify a notion of CIC that gives new hope for constructing CRH from one-way functions via instance compression. We observe that this notion is achievable under standard assumptions and, by revisiting the Harnik--Naor proof, demonstrate that it is sufficiently strong to achieve CRH. In fact, we show that our CIC notion is existentially equivalent to CRH.
Beyond Minicrypt, Harnik and Naor showed that a strengthening of instance compression can be used to construct OT and public-key encryption. We rule out the computational analog of this stronger notion by showing that it contradicts the existence of incompressible public-key encryption, which was recently constructed under standard assumptions.
High-Throughput Three-Party DPFs with Applications to ORAM and Digital Currencies
Distributed point functions (DPF) are increasingly becoming a foundational tool with applications for application-specific and general secure computation.
While two-party DPF constructions are readily available for those applications with satisfiable performance, the three-party ones are left behind in both security and efficiency.
In this paper we close this gap and propose the first three-party DPF construction that matches the state-of-the-art two-party DPF on all metrics.
Namely, it is secure against a malicious adversary corrupting both the dealer and one out of the three evaluators, its function's shares are of the same size and evaluation takes the same time as in the best two-party DPF.
Compared to the state-of-the-art three-party DPF, our construction enjoys $40-120\times$ smaller function's share size and shorter evaluation time, for function domains of $2^{16}-2^{40}$, respectively.
Apart from DPFs as a stand-alone tool, our construction finds immediate applications to private information retrieval (PIR), writing (PIW) and oblivious RAM (ORAM).
To further showcase its applicability, we design and implement an ORAM with access policy, an extension to ORAMs where a policy is being checked before accessing the underlying database.
The policy we plug-in is the one suitable for account-based digital currencies, and in particular to central bank digital currencies (CBDCs).
Our protocol offers the first design and implementation of a large scale privacy-preserving account-based digital currency. While previous works supported anonymity sets of 64-256 clients and less than 10 transactions per second (tps), our protocol supports anonymity sets in the millions, performing $\{500,200,58\}$ tps for anonymity sets of $\{2^{16},2^{18},2^{20}\}$, respectively.
Toward that application, we introduce a new primitive called updatable DPF, which enables a direct computation of a dot product between a DPF and a vector; we believe that updatable DPF and the new dot-product protocol will find interest in other applications.
Securely Computing One-Sided Matching Markets
Top trading cycles (TTC) is a famous algorithm for trading indivisible goods between a set of agents such that all agents are as happy as possible about the outcome. In this paper, we present a protocol for executing TTC in a privacy preserving way. To the best of our knowledge, it is the first of its kind. As a technical contribution of independent interest, we suggest a new algorithm for determining all nodes in a functional graph that are on a cycle. The algorithm is particularly well suited for secure implementation in that it requires no branching and no random memory access. Finally, we report on a prototype implementation of the protocol based on somewhat homomorphic encryption.
Optimal Early Termination for Dishonest Majority Broadcast
Deterministic broadcast protocols among $n$ parties tolerating $t$ corruptions require $\min\{f+2, t+1\}$ rounds, where $f \le t$ is the actual number of corruptions in an execution of the protocol. We provide the first protocol which is optimally resilient, adaptively secure, and asymptotically matches this lower bound for any $t<(1-\varepsilon)n$. By contrast, the best known algorithm in this regime by Loss and Nielsen (EUROCRYPT'24) always requires $O(\min\{f^2, t\})$ rounds. Our main technical tool is a generalization of the notion of polarizer introduced by Loss and Nielsen, which allows parties to obtain transferable cryptographic evidence of missing messages with fewer rounds of interaction.
Secure Stateful Aggregation: A Practical Protocol with Applications in Differentially-Private Federated Learning
Recent advances in differentially private federated learning (DPFL) algorithms have found that using correlated noise across the rounds of federated learning (DP-FTRL) yields provably and empirically better accuracy than using independent noise (DP-SGD). While DP-SGD is well-suited to federated learning with a single untrusted central server using lightweight secure aggregation protocols, secure aggregation is not conducive to implementing modern DP-FTRL techniques without assuming a trusted central server. DP-FTRL based approaches have already seen widespread deployment in industry, albeit with a trusted central curator who provides and applies the correlated noise.
To realize a fully private, single untrusted server DP-FTRL federated learning protocol, we introduce secure stateful aggregation: a simple append-only data structure that allows for the private storage of aggregate values and reading linear functions of the aggregates. Assuming Ring Learning with Errors, we provide a lightweight and scalable realization of this protocol for high-dimensional data in a new security/resource model, Federated MPC: where a powerful persistent server interacts with weak, ephemeral clients. We observe that secure stateful aggregation suffices for realizing DP-FTRL-based private federated learning: improving DPFL utility guarantees over the state of the art while maintaining privacy with an untrusted central party. Our approach has minimal overhead relative to existing techniques which do not yield comparable utility. The secure stateful aggregation primitive and the federated MPC paradigm may be of interest for other practical applications.
$\Sigma$-Check: Compressed $\Sigma$-protocol Theory from Sum-check
The theory of compressed $\Sigma$-protocols [AC20, ACF21] provides a standardized framework for creating efficient $\Sigma$-protocols. This method involves two main phases: first, amortization, which combines multiple instances that satisfy a homomorphic relation into a single instance; and second, Bulletproofs compression [BBB+18], which minimizes communication overhead to a logarithmic scale during the verification of the combined instance. For high-degree polynomial (non-homomorphic) relations, a circuit-based linearization technique is used to transform each instance into a linear relation, resulting in a protocol with at least linear complexity.
In this paper, we provide a direct method to extend the compressed $\Sigma$-protocol theory to polynomial relations, named $\Sigma$-Check. One major contribution of $\Sigma$-Check is to eliminate the linear cost associated with linearization in amortization. To achieve this, we employ a sum-check during this phase to ensure a logarithmic communication cost. To the best of our knowledge, $\Sigma$-Check is the first work to achieve a logarithmic amortization for polynomial relations. Nevertheless, without linearization, the amortized relation may not be linear, which hinders us from using Bulletproofs compression. To overcome this problem, we employ another sum-check during the compression phase to effectively manage high-degree relations. Additionally, we propose several variants of our techniques and adapt them for arithmetic circuit relations. We also demonstrate the practicality of our compressed $\Sigma$-protocol theory through applications such as binary proofs, range proofs, and partial knowledge proofs. Our basic protocols are initially based on the Discrete Logarithm (DL) assumption, and we have further extended these to incorporate the Strong-RSA assumption and the Generalized Discrete Logarithm Representation (GDLR) assumption. Our work expands the scope of compressed $\Sigma$-protocol theory and provides a robust foundation for real-world cryptographic applications.
AD-MPC: Fully Asynchronous Dynamic MPC with Guaranteed Output Delivery
Traditional secure multiparty computation (MPC) protocols presuppose a fixed set of participants throughout the computational process. To address this limitation, Fluid MPC [CRYPTO 2021] presents a dynamic MPC model that allows parties to join or exit during circuit evaluation dynamically. However, existing dynamic MPC protocols can guarantee safety but not liveness within asynchronous networks. This paper introduces ΠAD-MPC, a fully asynchronous dynamic MPC protocol. ΠAD-MPC ensures both safety and liveness with optimal resilience, capable of tolerating t (n=3t+1) corrupted participants. To achieve this, we develop a novel asynchronous transfer protocol ΠTrans and a preprocessing protocol ΠAprep specifically tailored for dynamic environments. In contrast to most dynamic MPC protocols that achieve security with abort in synchronous networks, ΠAD-MPC guarantees output delivery in asynchronous networks with optimal resilience, thus enhancing robustness. We provide a formal security proof of ΠAD-MPC under the Universal Composability (UC) framework. Furthermore, an extensive evaluation involving up to 20 geographically distributed nodes demonstrates the protocol’s practical performance and its ability to reliably deliver outputs in asynchronous dynamic settings. Compared to the state-of-the-art Fluid MPC, ΠAD-MPC achieves comparable performance while offering significantly enhanced security guarantees.
How to Construct Random Unitaries
The existence of pseudorandom unitaries (PRUs)---efficient quantum circuits that are computationally indistinguishable from Haar-random unitaries---has been a central open question, with significant implications for cryptography, complexity theory, and fundamental physics. In this work, we close this question by proving that PRUs exist, assuming that any quantum-secure one-way function exists. We establish this result for both (1) the standard notion of PRUs, which are secure against any efficient adversary that makes queries to the unitary $U$, and (2) a stronger notion of PRUs, which are secure even against adversaries that can query both the unitary $U$ and its inverse $U^\dagger$. In the process, we prove that any algorithm that makes queries to a Haar-random unitary can be efficiently simulated on a quantum computer, up to inverse-exponential trace distance.
One-Shot Native Proofs of Non-Native Operations in Incrementally Verifiable Computations
Proving non-native operations is still a bottleneck in existing incrementally verifiable computations. Prior attempts to solve this issue either simply improve the efficiency of proofs of non-native operations or require folding instances in each curve of a cycle. This paper shows how to avoid altogether in-circuit proofs of non-native operations in the incremental steps, and only record them in some auxiliary proof information. These operations are proved natively at the end of the computation, at the cost of only a small constant number (four or five) of non-native field multiplications to go from a non-native operation record to a native one. To formalise the security guarantees of the scheme, the paper introduces the concept of incrementally verifiable computation with auxiliary proof information, a relaxation of the standard notion of incrementally verifiable computation. The knowledge-soundness now guarantees the correctness of a computation only if the piece of information attached to a proof is valid. This new primitive is thus only to be used if there is an efficient mechanism to verify the validity of that information. This relaxation is exactly what enables a construction which does not require in-circuit proofs of non-native operations during the incremental part of the computation. Instantiated in the Plonk arithmetisation, the construction leads to savings in circuit-gate count (compared to standard folding-based constructions) of at least one order of magnitude, and that can go up to a factor of 50.
Towards Practical Oblivious Map
Oblivious map (OMAP) is an important component in encrypted databases, utilized to safeguard against the server inferring sensitive information about client's encrypted key-value stores based on access patterns. Despite its widespread usage and importance, existing OMAP solutions face practical challenges, including the need for a large number of interaction rounds between the client and server, as well as the substantial communication bandwidth requirements. For example, the state-of-the-art protocol named OMIX++ in VLDB 2024 still requires $O(\log{n})$ interaction rounds and $O(\log^2{n})$ communication bandwidth per access, where $n$ denotes the total number of key-value pairs stored.
In this work, we introduce more practical and efficient OMAP constructions. Consistent with all prior OMAPs, our constructions also adapt only the tree-based Oblivious RAM (ORAM) and oblivious data structures (ODS) to achieve OMAP for enhanced practicality. In complexity, our approach needs $O(\log{n}/\log{\log{n})+O(\log{\lambda})}$ interaction rounds and $O(\log^2{n}/\log{\log{n}})+O(\log{\lambda}\log{n})$ communication bandwidth per data access where $\lambda$ is the security parameter. This new complexity results from our two main contributions. First, unlike prior works that rely solely on search trees, we design a novel framework for OMAP that combines hash table with search trees. Second, we propose a more efficient tree-based ORAM named DAORAM, which is of significant independent interest. This newly developed ORAM noticeably accelerates our constructions as it supports obliviously accessing hash tables much more efficiently. We implement both our proposed constructions and prior methods to experimentally demonstrate that our constructions substantially outperform prior methods in terms of efficiency.
Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli and not the shapes of the moduli. This was unknown in the literature even though modular multiplication has been studied for more than 40 years. In the literature, shape-independent modular multiplications often perform several times slower than long multiplications even if we ignore the cost of the precomputation. (ii) We show that polynomial multiplications based on Nussbaumer fast Fourier transform and Toom–Cook over $\mathbb{Z}_{2^k}$ perform the best when modular multiplications are expensive and $k$ is not very close to the arithmetic precision.
For practical evaluation, we implement assembly programs for the polynomial arithmetic used in the digital signature Dilithium on Cortex-M3. For the modular multiplications in Dilithium, our generalized Barrett multiplications are 1.92 times faster than the state-of-the-art assembly-optimized Montgomery multiplications, leading to 1.38−1.51 times faster Dilithium NTT/iNTT. Along with the improvement in accumulating products, the core polynomial arithmetic matrix-vector multiplications are 1.71−1.77 times faster. We further apply the FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ to the challenge polynomial multiplication $c t_0$, leading to 1.31 times faster computation for $c t_0$.
We additionally apply the ideas to Saber on Cortex-M3 and demonstrate their improvement to Dilithium and Saber on our 8-bit AVR environment. For Saber on Cortex-M3, we show that matrix-vector multiplications with FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ are 1.42−1.46 faster than the ones with NTT-based polynomial multiplications over NTT-friendly coefficient rings. When moving to a platform with smaller arithmetic precision, such as 8-bit AVR, we improve the matrix-vector multiplication of Dilithium with our Barrett-based NTT/iNTT by a factor of 1.87−1.89. As for Saber on our 8-bit AVR environment, we show that matrix-vector multiplications with NTT-based polynomial multiplications over NTT-friendly coefficient rings are faster than polynomial multiplications over $\mathbb{Z}_{2^k}$ due to the large $k$ in Saber.
SIMD-style Sorting of Integer Sequence in RLWE Ciphertext
This article discusses fully homomorphic encryption and homomorphic sorting. Homomorphic encryption is a special encryption technique that allows all kinds of operations to be performed on ciphertext, and the result is still decryptable, such that when decrypted, the result is the same as that obtained by performing the same operation on the plaintext. Homomorphic sorting is an important problem in homomorphic encryption. Currently, there has been a volume of work on homomorphic sorting. In these works, each integer in a sequence is encrypted in a separate ciphertext, there is a lack of research on sorting sequences of integers encrypted in a single ciphertext. This paper addresses the sorting problem by utilizing Single Instruction Multiple Data (SIMD) technology to provide new algorithms to improve computational efficiency. The content includes the following aspects.
For plaintexts encrypted word-wise, this paper studies sorting an integer sequence stored in one or multiple ciphertexts, and proposes a new SIMD-style homomorphic sorting algorithm. On theoretical complexity, compared with three existing sorting algorithms, namely, homomorphic sorting by polynomial computation over a finite field, by TFHE bootstrapping, or by Liu-Wang parallel bootstrapping, the new algorithm achieves a speedup of $O((\log n)^2)$, $O(n(\log n)^3)$, and $O((\log n)^4)$, respectively, for sorting a plaintext integer sequence of length $n$. By experimental results, the new algorithm is 1.7-9.2 times faster than the three sorting algorithms.
The third situation involves sorting multiple shorter sequences simultaneously, all of which can be stored in a single ciphertext. For this situation, this paper proposes a method for calculating the ord function, and uses this method to provide a new sorting algorithm. On theoretical complexity, if the total number of numbers to be sorted is $n$ and there are $n^r$ numbers in each sequence, the new algorithm is faster than three existing sorting algorithms, with speed-ups of $O(n^{1-r}(\log n)^2)$, $O(n^{2-r}(\log n)^3)$, and $O(n^{1-r}(\log n)^4)$, respectively. By experimental results, the new algorithm is 2.1-6.4 times faster than existing sorting algorithms.
Curve Forests: Transparent Zero-Knowledge Set Membership with Batching and Strong Security
Zero-knowledge for set membership is a building block at the core of several privacy-aware applications, such as anonymous payments, credentials and whitelists.
We propose a new efficient construction for the batching variant of the problem, where a user intends to show knowledge of several elements (a batch) in a set without any leakage on the elements. Our construction is transparent—it does not requires a trusted setup—and based on Curve Trees by Campanelli, Hall-Andersen and Kamp (USENIX 2023). Our first technical contribution consists in techniques to amortize Curve Trees costs in the batching setting for which we crucially exploit its algebraic properties. Even for small batches we obtain $\approx 2\times$ speedups for proving, $\approx3\times$ speedups for verification and $\approx 60\%$ reduction in proof size. Our second contribution is a modifications of a key technical requirement in Curve Trees (related to so called "permissible points") which arguably simplifies its design and obtains a stronger security property. In particular, our construction is secure even for the case where the commitment to the set is provided by the adversary (in contrast to the honest one required by the original Curve Trees).
Parallel Execution Fee Mechanisms
Uncategorized
Uncategorized
This paper investigates how pricing schemes can achieve efficient allocations in blockchain systems featuring multiple transaction queues under a global capacity constraint. I model a capacity-constrained blockchain where users submit transactions to different queues—each representing a submarket with unique demand characteristics—and decide to participate based on posted prices and expected delays. I find that revenue maximization tends to allocate capacity to the highest-paying queue, whereas welfare maximization generally serves all queues. Optimal relative pricing of different queues depends on factors such as market size, demand elasticity, and the balance between local and global congestion. My results have implications for the implementation of local congestion pricing for evolving blockchain architectures, including parallel transaction execution, directed acyclic graph (DAG)-based systems, and multiple concurrent proposers.
Fiat-Shamir Goes Rational
This paper investigates the open problem of how to construct non-interactive rational proofs. Rational proofs, introduced by Azar and Micali (STOC 2012), are a model of interactive proofs where a computationally powerful server can be rewarded by a weaker client for running an expensive computation $f(x)$. The honest strategy is enforced by design when the server is rational: any adversary claiming a false output $y \neq f(x)$ will lose money on expectation.
Rational proof constructions have appealing properties: they are simple, feature an extremely efficient verifier—reading only a sublinear number of bits of the input $x$—and do not require any collateral from the prover. Currently, all non-trivial constructions of rational proofs are interactive. Developing non-interactive rational protocols would be a game-changer, making them practical for use in smart contracts, one of their most natural applications.
Our investigation revolves around the Fiat-Shamir transform, a common approach to compiling interactive proofs into their non-interactive counterparts. We are the first to tackle the question: "Can Fiat-Shamir be successfully applied to rational protocols?"
We find negative evidence by showing that, after applying Fiat-Shamir in the random oracle model to two representative protocols in literature (AM13 and CG15) these lose their security guarantees. Our findings point to more general impossibility theorems, which we leave as future work.
To achieve our results we first need to address a fundamental technical challenge: the standard Fiat-Shamir transform does not apply to protocols where the verifier has only oracle access to its input $x$ (a core feature of the rational setting). We propose two versions of Fiat-Shamir for this setting, a "vanilla" variant and a "stronger" variant (where the verifier has access to an honestly computed digest of its input). We show that neither variant is sufficient to ensure that AM13 or CG15 are secure in the non-interactive setting.
Finally, as an additional contribution, we provide a novel, and arguably simpler, definition for the soundness property of rational proofs (interactive or non-interactive) of independent interest.
A Tight Lower Bound on the TdScrypt Trapdoor Memory-Hard Function
A trapdoor Memory-Hard Function is a function that is memory-hard to evaluate for any party who does not have a trapdoor, but is substantially less expensive to evaluate with the trapdoor. Biryukov and Perin (ASIACRYPT 2017) introduced the first candidate trapdoor Memory-Hard Function called Diodon which modifies a Memory-Hard Function called Scrypt by replacing a hash chain with repeated squaring modulo a composite number $N=pq$. The trapdoor, which consists of the prime factors $p$ and $q$, allows one to compute the function with significantly reduced cumulative memory cost (CMC) $O( n \log n \log^2 N )$ where $n$ denotes the running time parameter, e.g., the length of the hash chain or repeated squaring chain. By contrast, the best-known algorithm to compute Diodon without the trapdoor has the CMC $O(n^2\log N)$. Auerbach et al. (EUROCRYPT 2024) provided the first provable lower bound on the CMC of TdScrypt --- a specific instantiation of Diodon. In particular, in idealized models, they proved that the CMC of TdScrypt is $\Omega(\frac{n^2}{\log n}\log N)$ which almost matches the upper bound $O(n^2\log N)$ but is off by a multiplicative $\log n$ factor. In this work, we show how to tighten the analysis of Auerbach et al. (EUROCRYPT 2024) and eliminate the gap. In particular, our results imply that TdScrypt has the CMC at least $\Omega(n^2\log N)$.
Optimizing Liveness for Blockchain-Based Sealed-Bid Auctions in Rational Settings
Blockchain-based auction markets offer stronger fairness and transparency compared to their centralized counterparts. Deposits and sealed bid formats are usually applied to enhance security and privacy. However, to our best knowledge, the formal treatment of deposit-enabled sealed-bid auctions remains lacking in the cryptographic literature. To address this gap, we first propose a decentralized anonymous deposited-bidding (DADB) scheme, providing formal syntax and security definitions. Unlike existing approaches that rely on smart contracts, our construction utilizes a mainchain-sidechain structure that is also compatible with the extended UTXO model. This design further allows us to develop a consensus mechanism on the sidechain dedicated to securely recording bids for allocation. Specifically, we build atop an Algorand-style protocol and integrate a novel block qualification mechanism into the block selection. Consequently, we prove, from a game-theoretical perspective, that our design optimizes liveness latency for rational users who want to join the auction, even without explicit incentives (e.g., fees) for including bids. Finally, our implementation results demonstrate the potential performance degradation without the block qualification mechanism.
Fuzzy PSI via Oblivious Protocol Routing
In private set intersection (PSI), two parties who each hold sets of items can learn their intersection without revealing anything about their other items. Fuzzy PSI corresponds to a relaxed variant that reveals pairs of items which are ``close enough,'' with respect to some distance metric. In this paper we propose a new protocol framework for fuzzy PSI, compatible with arbitrary distance metrics. We then show how to efficiently instantiate our framework for $\ell_1$, $\ell_2$, and $\ell_\infty$ metrics, in a way that uses exclusively cheap symmetric-key operations. One notable feature of our protocol is that it has only logarithmic dependency on the distance threshold, whereas most other protocols have linear (or higher) dependency. For many reasonable combinations of parameters, our protocol has the lowest communication cost of existing fuzzy PSI protocols.
Simplification Issues of An Authentication and Key Agreement Scheme for Smart Grid
Key agreement and public key encryption are two elementary cryptographic primitives, suitable for different scenarios. But their differences are still not familiar to some researchers. In this note, we show that the Safkhani et al.'s key agreement scheme [Peer-to-Peer Netw. Appl. 15(3), 1595-1616, 2022] is a public key encryption in disguise. We stress that the ultimate use of key agreement is to establish a shared key for some symmetric key encryption. We also present a simplification of the scheme by removing some repetitive computations. To the best of our knowledge, it is the first time to clarify the fundamental differences between the two primitives. The techniques developed in this note will be helpful for the future works on designing such schemes.
Maximizing the Utility of Cryptographic Setups: Secure PAKEs, with either functional RO or CRS
For Password-Based Authenticated Key Exchange (PAKE), an idealized setup such as random oracle (RO) or a trusted setup such as common reference string (CRS) is a must in the universal composability (UC) framework (Canetti, FOCS 2001). Given the potential failure of a CRS or RO setup, it is natural to consider distributing trust among the two setups, resulting a CRS-or-RO-setup (i.e., CoR-setup).
However, the infeasibility highlighted by Katz et al. (PODC 2014) suggested that it is impossible to construct UC-secure PAKE protocols with a straightforward CoR-setup (i.e., either the CRS is functional but the RO is compromised, or the RO is functional but the CRS is compromised). To circumvent this impossibility result, we investigate how to design UC-secure PAKE protocols with a fine-grained CoR-setup, where either the CRS is functional but the RO is non-functional, or vice versa. Different from the straightforward CoR-setup, a fine-grained non-functional setup is not necessarily completely compromised and fully controlled by the adversary; Instead, we consider this non-functional setup may still offer certain security properties. Certainly, the non-functional setup alone should be useless for achieving UC-security.
We present a UC-secure PAKE protocol under two conditions: either the CRS is functional while the RO is non-functional (falling back to a collision-resistant hash function), or the RO is functional while the CRS is non-functional (falling back to a global CRS). Before presenting our construction, we first prove that a global CRS setup alone is insufficient for achieving UC-secure PAKE. This impossibility result highlights the non-triviality of our approach.
To obtain our construction, we introduce several techniques as follows:
(1) We propose a new variant of Non-Interactive Key Exchange (NIKE), called homomorphic NIKE with associated functions, which captures key properties of existing RO-based PAKE protocols. This new primitive serves as an important component in our construction.
(2) We develop a ``Brute Force'' extraction strategy which allows us to provide security analysis for our UC-secure PAKE with a fine-grained CoR-setup for polynomial-sized password spaces.
(3) We introduce a novel password space extension technique that enables the expansion of PAKE protocols from polynomial-sized to arbitrary-sized password spaces.
(4) Finally, to ensure provable security for our password space extension in UC-secure PAKEs, we modify existing PAKE functionalities to prevent responses that reveal the correctness of password guesses. This is a reasonable adjustment, as our protocol provides only implicit authentication.
We further present a PAKE protocol in the BPR framework (Bellare, Pointcheval, Rogaway, EuroCrypt 2000), assuming either the CRS is functional while the RO falls back to a collision-resistant hash function, or the RO is functional but the CRS trapdoor is allowed to be learned by the adversary.
Efficient Quantum Pseudorandomness from Hamiltonian Phase States
Quantum pseudorandomness has found applications in many areas of quantum information, ranging from entanglement theory, to models of scrambling phenomena in chaotic quantum systems, and, more recently, in the foundations of quantum cryptography. Kretschmer (TQC '21) showed that both pseudorandom states and pseudorandom unitaries exist even in a world without classical one-way functions. To this day, however, all known constructions require classical cryptographic building blocks which are themselves synonymous with the existence of one-way functions, and which are also challenging to realize on realistic quantum hardware.
In this work, we seek to make progress on both of these fronts simultaneously---by decoupling quantum pseudorandomness from classical cryptography altogether.
We introduce a quantum hardness assumption called the \emph{Hamiltonian Phase State} ($\mathsf{HPS}$) problem, which is the task of decoding output states of a random instantaneous quantum polynomial-time (IQP) circuit. Hamiltonian phase states can be generated very efficiently using only Hadamard gates, single-qubit $Z$ rotations and CNOT circuits. We show that the hardness of our problem reduces to a worst-case version of the problem, and we provide evidence that our assumption is plausibly fully quantum; meaning, it cannot be used to construct one-way functions.
We also show information-theoretic hardness when only few copies of $\mathsf{HPS}$ are available by proving an approximate $t$-design property of our ensemble.
Finally, we show that our $\mathsf{HPS}$ assumption and its variants allow us to efficiently construct many pseudorandom quantum primitives, ranging from pseudorandom states, to quantum pseudoentanglement, to pseudorandom unitaries, and even primitives such as public-key encryption with quantum keys. Along the way, we analyze a natural iterative construction of pseudorandom unitaries which resembles a candidate of Ji, Liu, and Song (CRYPTO'18).
Modular Reduction in CKKS
The Cheon-Kim-Kim-Song (CKKS) scheme is renowned for its efficiency in encrypted computing over real numbers. However, it lacks an important functionality that most exact schemes have, an efficient modular reduction. This derives from the fundamental difference in encoding structure. The CKKS scheme encodes messages to the least significant bits, while the other schemes encode to the most significant bits (or in an equivalent manner). As a result, CKKS could enjoy an efficient rescaling but lost the ability to modular reduce inherently.
Our key observation is that at the very bottom modulus, plaintexts encoded in the least significant bits can still enjoy the inherent modular reduction of RLWE. We suggest incorporating modular reduction as a primary operation for CKKS and exploring its impact on efficiency. We constructed a novel homomorphic modular reduction algorithm using the discrete bootstrapping from Bae et al. [Asiacrypt'24] and a new discretization algorithm from modulus switching. One of the key advantages of our modular reduction is that its computational complexity grows sublinearly ($O(\log k)$) as we increase the input range $[0,k)$, which is asymptotically better than the state-of-the-art with $\geq O(k)$.
We checked our algorithms with concrete experiments. Notably, our modulo 1 function for input range $[0, 2^{20})$ takes only 44.9 seconds with 13.3 bits of (mean) precision, in a single-threaded CPU. Recall that modular reduction over such a large range was almost infeasible in the previous works, as they need to evaluate a polynomial of degree $> 2^{20}$ (or equivalent). As an application of our method, we compared a bit decomposition based on our framework with the state-of-the-art method from Drucker et al. [J.Cryptol'24]. Our method is $7.1 \times$ faster while reducing the failure probability by more than two orders of magnitude.
Bootstrapping Small Integers With CKKS
The native plaintexts of the Cheon-Kim-Kim-Song (CKKS) fully homomorphic encryption scheme are vectors of approximations to complex numbers. Drucker et al. [J. Cryptol.'24] have showed how to use CKKS to efficiently perform computations on bits and small bit-length integers, by relying on their canonical embeddings into the complex plane. For small bit-length integers, Chung et al. [IACR eprint'24] recently suggested to rather rely on an embedding into complex roots of unity, to gain numerical stability and efficiency. Both works use CKKS in a black-box manner.
Inspired by the design by Bae et al. [Eurocrypt'24] of a dedicated bootstrapping algorithm for ciphertexts encoding bits, we propose a CKKS bootstrapping algorithm, $\mathsf{SI\mbox{-}BTS}$ (small-integer bootstrapping), for ciphertexts encoding small bit-length integers.
For this purpose, we build upon the DM/CGGI-to-CKKS conversion algorithm from Boura et al. [J. Math. Cryptol.'20], to bootstrap canonically embedded integers to integers embedded as roots of unity. $\mathsf{SI\mbox{-}BTS}$ allows functional bootstrapping: it can evaluate an arbitrary function of its input while bootstrapping. It may also be used to batch-(functional-)bootstrap multiple DM/CGGI ciphertexts. For example, its amortized cost for evaluating an 8-bit look-up table on $2^{12}$ DM/CGGI ciphertexts is 3.75ms (single-thread CPU, 128-bit security).
We adapt $\mathsf{SI\mbox{-}BTS}$ to simultaneously bootstrap multiple CKKS ciphertexts for bits. The resulting $\mathsf{BB\mbox{-}BTS}$ algorithm (batch-bits bootstrapping) allows to decrease the amortized cost of a binary gate evaluation. Compared to Bae et al., it gives a 2.4x speed-up.
Quantum State Group Actions
Cryptographic group actions are a leading contender for post-quantum cryptography, and have also been used in the development of quantum cryptographic protocols. In this work, we explore quantum group actions, which consist of a group acting on a set of quantum states. We show the following results:
1. In certain settings, statistical (even query bounded) security is impossible, analogously to post-quantum classical group actions.
2. We construct quantum state group actions and prove that many computational problems that have been proposed by cryptographers hold it. Depending on the construction, our proofs are either unconditional, rely on LWE, or rely on the quantum random oracle model. While our analysis does not directly apply to classical group actions, we argue it gives at least a sanity check that there are no obvious flaws in the post-quantum assumptions made by cryptographers.
3. Our quantum state group action allows for unifying two existing quantum money schemes: those based on group actions, and those based on non-collapsing hashes. We also explain how they can unify classical and quantum key distribution.
RPO-M31 and XHash-M31: Efficient Hash Functions for Circle STARKs
We present two new arithmetization oriented hash functions based on RPO [Ashur, kindi, Meier, Szepieniec, Threadbare; ePrint 2022/1577] and XHash-12 [Ashur, Bhati, Kindi, Mahzoun, Perrin; ePrint 2023/1045] adapted for $p=2^{31}-1$ and ready to use in Circle STARKs [Habock, Levit, Papini; ePrint 2024/278].
On Constructing Pseudorandom Involutions: Feistel variants using a single round function
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions.
First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides non-trivial cryptographic strength. To complement, we also show insecurity of 3-round Feistel-SF by exhibiting an attack.
Second, a ``mirrored'' variant of the Naor-Reingold construction with component reusing yields a pseudorandom involution.
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating efficient Boolean masked adders. In this paper, we present first- and second-order secure hardware implementations to perform B2A mask conversion efficiently without using masked adder structures. We first introduce a first-order secure low-latency gadget that executes a B2A2k in a single cycle. Furthermore, we propose a second-order secure B2A2k gadget that has a latency of only 4 clock cycles. Both gadgets are independent of the input word size k. We then show how these new primitives lead to improved B2Aq hardware implementations that perform a B2A mask conversion of integers modulo an arbitrary number. Our results show that our new gadgets outperform comparable solutions by more than a magnitude in terms of resource requirements and are at least 3 times faster in terms of latency and throughput. All gadgets have been formally verified and proven secure in the glitch-robust PINI security model. We additionally confirm the security of our gadgets on an FPGA platform using practical TVLA tests.
Fully Secure Searchable Encryption from PRFs, Pairings, and Lattices
Searchable encryption is a cryptographic primitive that allows us to perform searches on encrypted data. Searchable encryption schemes require that ciphertexts do not leak information about keywords. However, most of the existing schemes do not achieve the security notion that trapdoors do not leak information. Shen et al. (TCC 2009) proposed a security notion called full security, which includes both ciphertext privacy and trapdoor privacy, but there are few fully secure constructions. Full security is defined for the secret key settings since it is known that public key schemes cannot achieve the trapdoor privacy in principle.
In this paper, we construct a query-bounded fully secure scheme from pseudorandom functions. In addition, we propose three types of efficient (unbounded) fully secure schemes. One of them is based on bilinear groups, and the others are besed on lattices. We then analyze the existing constructions. We then analyze the existing constructions. First, we simplify the Cheng et al. scheme (Information Sciences 2023) and prove its security. This scheme had not been proved to be secure. Second, we show that the Li-Boyen pairing-based scheme (IACR CiC 2024) does not achieve the trapdoor privacy, not as claimed.
Sparrow: Space-Efficient zkSNARK for Data-Parallel Circuits and Applications to Zero-Knowledge Decision Trees
Space-efficient SNARKs aim to reduce the prover's space overhead which is one the main obstacles for deploying SNARKs in practice, as it can be prohibitively large (e.g., orders of magnitude larger than natively performing the computation). In this work, we propose Sparrow, a novel space-efficient zero-knowledge SNARK for data-parallel arithmetic circuits with two attractive features: (i) it is the first space-efficient scheme where, for a given field, the prover overhead increases with a multiplicative sublogarithmic factor as the circuit size increases, and (ii) compared to prior space-efficient SNARKs that work for arbitrary arithmetic circuits, it achieves prover space asymptotically smaller than the circuit size itself. Our key building block is a novel space-efficient sumcheck argument with improved prover time which may be of independent interest. Our experimental results for three use cases (arbitrary data parallel circuits, multiplication trees, batch SHA256 hashing) indicate Sparrow outperforms the prior state-of-the-art space-efficient SNARK for arithmetic circuits Gemini (Bootle et al., EUROCRYPT'22) by $3.2$-$28.7\times$ in total prover space and $3.1$-$11.3\times$ in prover time. We then use Sparrow to build zero-knowledge proofs of tree training and prediction, relying on its space efficiency to scale to large datasets and forests of multiple trees. Compared to a (non-space-efficient) optimal-time SNARK based on the GKR protocol, we observe prover space reduction of $16$-$240\times$ for tree training while maintaining essentially the same prover and verifier times and proof size. Even more interestingly, our prover requires comparable space to natively performing the underlying computation. E.g., for a $400$MB dataset, our prover only needs $1.4\times$ more space than the native computation.
Hybrid Password Authentication Key Exchange in the UC Framework
A hybrid cryptosystem combines two systems that fulfill the same cryptographic functionality, and its security enjoys the security of the harder one. There are many proposals for hybrid public-key encryption (hybrid PKE), hybrid signature (hybrid SIG) and hybrid authenticated key exchange (hybrid AKE). In this paper, we fill the blank of Hybrid Password Authentication Key Exchange (hybrid PAKE).
For constructing hybrid PAKE, we first define an important class of PAKE -- full DH-type PAKE, from which we abstract sufficient properties to achieve UC security. Our full DH-type PAKE framework unifies lots of PAKE schemes like SPAKE2, TBPEKE, (Crs)X-GA-PAKE, and summarizes their common features for UC security.
Stepping from full DH-type PAKE, we propose two generic approaches to hybrid PAKE, parallel composition and serial composition.
-- We propose a generic construction of hybrid PAKE via parallel composition and prove that the hybrid PAKE by composing DH-type PAKEs in parallel is a full DH-type PAKE and hence achieves UC security, as long as one underlying DH-type PAKE is a full DH-type.
-- We propose a generic construction of hybrid PAKE via serial composition, and prove that the hybrid PAKE by composing a DH-type PAKE and another PAKE in serial achieves UC security, if either the DH-type PAKE is a full DH-type or the other PAKE has UC security and the DH-type PAKE only has some statistical properties.
Our generic constructions of hybrid PAKE result in a variety of hybrid PAKE schemes enjoying different nice features, like round-optimal, high efficiency, or UC security in quantum random oracle model (QROM).
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains unsatisfactory, largely because of the length of the operands and the computational expense associated with several resource-intensive operations. Among these operations, key-switching is one of the most demanding processes because it involves complex arithmetic operations necessary to conduct computations in a larger cyclotomic ring.
In this research, we introduce a novel algorithm that achieves linear complexity in the Number Theoretic Transform (NTT) for key-switching. This algorithm offers efficiency comparable to the state-of-the-art while being significantly simpler and consumes less GPU memory. Notably, it reduces space consumption by up to 95\%, making it highly friendly for GPU memory. By optimizing GPU performance, our implementation achieves up to a 2.0$\times$ speedup compared to both the baseline approach and the current state-of-the-art methods. This algorithm effectively balances simplicity and performance, thereby enhancing cryptographic computations on modern hardware platforms and paving the way to more practical and efficient FHE implementations in cloud computing environments.
Glacius: Threshold Schnorr Signatures from DDH with Full Adaptive Security
Threshold signatures are one of the most important cryptographic primitives in distributed systems. The threshold Schnorr signature scheme, an efficient and pairing-free scheme, is a popular choice and is included in NIST's standards and recent call for threshold cryptography. Despite its importance, most threshold Schnorr signature schemes assume a static adversary in their security proof. A recent scheme proposed by Katsumata et al. (Crypto 2024) addresses this issue. However, it requires linear-sized signing keys and lacks the identifiable abort property, which makes it vulnerable to denial-of-service attacks. Other schemes with adaptive security either have reduced corruption thresholds or rely on non-standard assumptions such as the algebraic group model (AGM) or hardness of the algebraic one-more discrete logarithm (AOMDL) problem.
In this work, we present Glacius, the first threshold Schnorr signature scheme that overcomes all these issues. Glacius is adaptively secure based on the hardness of decisional Diffie-Hellman (DDH) in the random oracle model (ROM), and it supports a full corruption threshold $t<n$, where $n$ is the total number of signers and $t$ is the signing threshold. Additionally, Glacius provides constant-sized signing keys and identifiable abort, meaning signers can detect misbehavior. We also give a formal game-based definition of identifiable abort, addressing certain subtle issues present in existing definitions, which may be of independent interest.
Lollipops of pairing-friendly elliptic curves for composition of proof systems
We construct lollipops of pairing-friendly elliptic curves, which combine pairing-friendly chains with pairing-friendly cycles. The cycles inside these lollipops allow for unbounded levels of recursive pairing-based proof system composition, while the chains leading into these cycles alleviate a significant drawback of using cycles on their own: the only known cycles of pairing-friendly elliptic curves force the initial part of the circuit to be arithmetised on suboptimal (much larger) finite fields. Lollipops allow this arithmetisation to instead be performed over finite fields of an optimal size, while preserving the unbounded recursion afforded by the cycle.
The notion of pairing-friendly lollipops itself is not novel. In 2019 the Coda + Dekrypt ``SNARK challenge'' offered a $20k USD prize for the best lollipop construction, but to our knowledge no lollipops were submitted to the challenge or have since emerged in the literature. This paper therefore gives the first construction of such lollipops.
The main technical ingredient we use is a new way of instantiating pairing-friendly cycles over supersingular curves whose characteristics correspond to those in MNT cycles. The vast majority of MNT cycles that exist are unable to be instantiated in practice, because the corresponding CM discriminant is too large to construct the MNT curves explicitly. Our method can be viewed as a workaround that allows cycles to be instantiated regardless of the CM discriminant of the MNT curves.
Faster Proofs and VRFs from Isogenies
We improve recent generic proof systems for isogeny knowledge by Cong, Lai, Levin [27] based on circuit satisfiability, by using radical isogeny descriptions [20, 21] to prove a path in the underlying isogeny graph. We then present a new generic construction for a verifiable random function (VRF) based on a one-more type hardness assumption and zero-knowledge proofs. We argue that isogenies fit the constraints of our construction and instantiate the VRF with a CGL walk [23] and our new proofs. As a different contribution, we also propose a new VRF in the effective group action description of isogenies from [1]. Our protocol takes a novel approach based on the polynomial-in-the-exponent technique first described in [36], but without the need of a trusted setup or heavy preprocessing. We compare our protocols to the current state-of-the-art isogeny VRFs by Leroux [56] and Lai [54], with a particular emphasis on computational efficiency.
On the Tight Security of the Double Ratchet
The Signal Protocol is a two-party secure messaging protocol used in applications such as Signal, WhatsApp, Google Messages and Facebook Messenger and is used by billions daily. It consists of two core components, one of which is the Double Ratchet protocol that has been the subject of a line of work that aims to understand and formalise exactly what security it provides. Existing models capture strong guarantees including resilience to state exposure in both forward security (protecting past secrets) and post-compromise security (restoring security), adaptive state corruptions, message injections and out-of-order message delivery. Due to this complexity, prior work has failed to provide security guarantees that do not degrade in the number of interactions, even in the single-session setting.
Given the ubiquity of the Double Ratchet in practice, we explore tight security bounds for the Double Ratchet in the multi-session setting. To this end, we revisit the modelling of Alwen, Coretti and Dodis (EUROCRYPT 2019) who decompose the protocol into modular, abstract components, notably continuous key agreement (CKA) and forward-secure AEAD (FS-AEAD). To enable a tight security proof, we propose a CKA security model that provides one-way security under key checking attacks. We show that multi-session security of the Double Ratchet can be tightly reduced to the multi-session security of CKA and FS-AEAD, capturing the same strong security guarantees as Alwen et al.
Our result improves upon the bounds of Alwen et al. in the random oracle model. Even so, we are unable to provide a completely tight proof for the Double Ratchet based on standard Diffie-Hellman assumptions, and we conjecture it is not possible. We thus go a step further and analyse CKA based on key encapsulation mechanisms (KEMs). In contrast to previous works, our new analysis allows for tight constructions based on the DDH and post-quantum assumptions.
Double-Matrix: Complete Diffusion in a Single Round with (small) MDS Matrices
This paper describes a simple idea to improve (text) diffusion in block ciphers that use MDS codes but that take more than a single round to achieve full (text) diffusion. The Rijndael cipher family is used as an example since it comprises ciphers with different state sizes.
A drawback of the new approach is the additional computational cost, but it is competitive compared to large MDS matrices used in the Khazad and Kuznyechik ciphers.
General Functional Bootstrapping using CKKS
The Ducas-Micciancio (DM/FHEW) and Chilotti-Gama-Georgieva-Izabachène (CGGI/TFHE) cryptosystems provide a general privacy-preserving computation capability. These fully homomorphic encryption (FHE) cryptosystems can evaluate an arbitrary function expressed as a general look-up table (LUT) via the method of functional bootstrapping (also known as programmable bootstrapping). The main limitation of DM/CGGI functional bootstrapping is its efficiency because this procedure has to bootstrap every encrypted number separately. A different bootstrapping approach, based on the Cheon-Kim-Kim-Song (CKKS) FHE scheme, can achieve much smaller amortized time due to its ability to bootstrap many thousands of numbers at once. However, CKKS does not currently provide a functional bootstrapping capability that can evaluate a general LUT. An open research question is whether such capability can be efficiently constructed. We give a positive answer to this question by proposing and implementing a general functional bootstrapping method based on CKKS-style bootstrapping. We devise a theoretical toolkit for evaluating an arbitrary function using the theory of trigonometric Hermite interpolations, which provides control over noise reduction during functional bootstrapping. Our experimental results for 8-bit LUT evaluation show that the proposed method achieves the amortized time of 0.75 milliseconds, which is three orders of magnitude faster than the DM/CGGI approach and 6.6x faster than (a more restrictive) amortized functional bootstrapping method based on the Brakerski/Fan-Vercauteren (BFV) FHE scheme.
A New Approach Towards Encrypted Data Sharing and Computation: Enhancing Efficiency Beyond MPC and Multi-Key FHE
In this paper, we introduce a novel approach to Multi-Key Fully Homomorphic Encryption (MK-FHE) that enhances both efficiency and security beyond the capabilities of traditional MK-FHE and MultiParty Computation (MPC) systems. Our method generates a unified key structure, enabling constant ciphertext size and constant execution time for encrypted computations, regardless of the number of participants involved. This approach addresses critical limitations such as ciphertext size expansion, noise accumulation, and the complexity of relinearization, which typically hinder scalability in multi-user environments. We also propose a new decryption method that simplifies decryption to a single information exchange, in contrast to traditional multi-key FHE systems that require information to be passed between all parties sequentially.
Additionally, it significantly enhances the scalability of MK-FHE systems, allowing seamless integration of additional participants without introducing performance overhead. Through theoretical analysis and practical implementation, we demonstrate the superiority of our approach in large-scale, collaborative encrypted computation scenarios, paving the way for more robust and efficient secure data processing frameworks. Further more, unlike the threshold based FHE schemes, the proposed system doesn’t require a centralised trusted third party to split and distribute the individual secret keys, instead each participant independently generates their own secret key, ensuring both security and decentralization.
PAKE Combiners and Efficient Post-Quantum Instantiations
Much work has been done recently on developing password-authenticated key exchange (PAKE) mechanisms with post-quantum security. However, modern guidance recommends the use of hybrid schemes—schemes which rely on the combined hardness of a post-quantum assumption, e.g., Learning with Errors (LWE), and a more traditional assumption, e.g., decisional Diffie-Hellman. To date, there is no known hybrid PAKE construction, let alone a general method for achieving such.
In this paper, we present two efficient PAKE combiners—algorithms that take two PAKEs satisfying mild assumptions, and output a third PAKE with combined security properties—and prove these combiners secure in the Universal Composability (UC) model. Our sequential combiner, instantiated with efficient existing PAKEs such as CPace (built on Diffie-Hellman-type assumptions) and CAKE (built on lattice assumptions), yields the first known hybrid PAKE.
Really Complex Codes with Application to STARKs
Reed-Solomon (RS) codes [RS60], representing evaluations of univariate polynomials over distinct domains, are foundational in error correction and cryptographic protocols. Traditional RS codes leverage the Fourier domain for efficient encoding and decoding via Fast Fourier Transforms (FFT). However, in fields such as the Reals and some finite prime fields, limited root-of-unity orders restrict these methods. Recent research, particularly in the context of modern STARKs [BSBHR18b], has explored the Complex Fourier domain for constructing Real-valued RS codes, introducing novel transforms such as the Discrete Circle-Curve Transform (DCCT) for Real-to-Real transformations [HLP24]. Despite their efficiency, these transforms face limitations with high radix techniques and potentially other legacy know-hows. In this paper, we propose a construction of Real-valued RS codes utilizing the Discrete Fourier Transform (DFT) over the Complex domain. This approach leverages well-established algebraic and Fourier properties to achieve efficiency comparable to DCCT, while retaining compatibility with legacy techniques and optimizations.
Structure-Preserving Compressing Primitives: Vector Commitments and Accumulators
Compressing primitives such as accumulators and vector commitments, allow to rep- resent large data sets with some compact, ideally constant-sized value. Moreover, they support operations like proving membership or non-membership with minimal, ideally also constant- sized, storage and communication overhead. In recent years, these primitives have found nu- merous practical applications, with many constructions based on various hardness assumptions. So far, however, it has been elusive to construct these primitives in a strictly structure-preserving setting, i.e., in a bilinear group in a way that messages, commitments and openings are all ele- ments of the two source groups. Interestingly, backed by existing impossibility results, not even conventional commitments with such constraints are known in this setting. However, in many practical applications it would be convenient or even required to be structure-preserving, e.g., to commit or accumulate group elements. In this paper we investigate whether strictly structure-preserving compressing primitives can be realized. We close this gap by presenting the first strictly structure-preserving commitment that is shrinking (and in particular constant-size). We circumvent existing impossibility results by employing a more structured message space, i.e., a variant of the Diffie-Hellman message space. Our main results are constructions of structure-preserving vector commitments as well as structure-preserving accumulators. We first discuss generic constructions and then present concrete constructions under the Diffie-Hellman Exponent assumption. To demonstrate the usefulness of our constructions, we discuss various applications.
Shaking up authenticated encryption
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the sequence of messages since the start of the session. The chaining in the session allows decryption in segments, avoiding the need to buffer the entire deciphered cryptogram between decryption and validation. And, thanks to the collision resistance of (Turbo)SHAKE, they provide so-called CMT-4 committing security, meaning that they provide strong guarantees that a ciphertext uniquely binds to the key, plaintext and associated data. The AE schemes we propose have the unique combination of advantages that 1) their security is based on the security claim of SHAKE, that has received a large amount of public scrutiny, that 2) they make use of the standard KECCAK-p permutation that not only receives more and more dedicated hardware support, but also allows
competitive software-only implementations thanks to the TurboSHAKE instances, and that 3) they do not suffer from a 64-bit birthday bound like most AES-based schemes.
Algebraic Equipage for Learning with Errors in Cyclic Division Algebras
In Noncommutative Ring Learning With Errors From Cyclic Algebras, a variant of Learning with Errors from cyclic division algebras, dubbed ‘Cyclic LWE', was developed, and security reductions similar to those known for the ring and module case were given, as well as a Regev-style encryption scheme. In this work, we make a number of improvements to that work: namely, we describe methods to increase the number of cryptographically useful division algebras, demonstrate the hardness of CLWE from ideal lattices obtained from non-maximal orders, and study Learning with Rounding in cyclic division algebras.
End-to-End Encrypted Cloud Storage in the Wild: A Broken Ecosystem
End-to-end encrypted cloud storage offers a way for individuals and organisations to delegate their storage needs to a third-party, while keeping control of their data using cryptographic techniques.
We conduct a cryptographic analysis of various products in the ecosystem, showing that many providers fail to provide an adequate level of security. In particular, we provide an in-depth analysis of five end-to-end encrypted cloud storage systems, namely Sync, pCloud, Icedrive, Seafile, and Tresorit, in the setting of a malicious server. These companies cumulatively have over 22 million users and are major providers in the field. We unveil severe cryptographic vulnerabilities in four of them. Our attacks invalidate the marketing claims made by the providers of these systems, showing that a malicious server can, in some cases, inject files in the encrypted storage of users, tamper with file data, and even gain direct access to the content of the files. Many of our attacks affect multiple providers in the same way, revealing common failure patterns in independent cryptographic designs.
We conclude by discussing the significance of these patterns beyond the security of the specific providers.
LeOPaRd: Towards Practical Post-Quantum Oblivious PRFs via Interactive Lattice Problems
In this work, we introduce a more efficient post-quantum oblivious PRF (OPRF) design, called LeOPaRd. Our proposal is round-optimal and supports verifiability and partial obliviousness, all of which are important for practical applications. The main technical novelty of our work is a new method for computing samples of MLWE (module learning with errors) in a two-party setting. To do this, we introduce a new family of interactive lattice problems, called interactive MLWE and rounding with re-use (iMLWER-RU). We rigorously study the hardness of iMLWER-RU and reduce it (under some natural idealized setting) to a more standard MLWE-like problem where the adversary is additionally given access to a randomized MLWE PRF oracle. We believe iMLWER-RU can be of independent interest for other interactive protocols.
LeOPaRd exploits this new iMLWER-RU assumption to realize a lattice-based OPRF design without relying on heavy machinery such as noise flooding and fully homomorphic encryption used in earlier works. LeOPaRd can feature around 136 KB total communication, compared to 300+ KB in earlier works. We also identify gaps in some existing constructions and models, and propose appropriate fixes.
Related-Key Cryptanalysis of FUTURE
In Africacrypt 2022, Gupta \etal introduced a 64-bit lightweight \mds matrix-based \spn-like block cipher designed to encrypt data in a single clock cycle with minimal implementation cost, particularly when unrolled. While various attack models were discussed, the security of the cipher in the related-key setting was not addressed. In this work, we bridge this gap by conducting a security analysis of the cipher under related-key attacks using \milp(Mixed Integer Linear Programming)-based techniques. Our model enables a related-key distinguishing attack on 8 rounds of FUTURE, requiring $2^{64}$ plaintexts, $2^{63}$ \xor operations, and negligible memory. Additionally, we present a 10-round boomerang distinguisher with a probability of $2^{-45}$, leading to a distinguishing attack with $2^{46}$ plaintexts, $2^{46}$ \xor operations, and negligible memory. This result demonstrates a full break of the cipher’s 64-bit security in the related-key setting.
Efficient Maliciously Secure Oblivious Exponentiations
Oblivious Pseudorandom Functions (OPRFs) allow a client to evaluate a pseudorandom function (PRF) on her secret input based on a key that is held by a server. In the process, the client only learns the PRF output but not the key, while the server neither learns the input nor the output of the client. The arguably most popular OPRF is due to Naor, Pinkas and Reingold (Eurocrypt 2009). It is based on an Oblivious Exponentiation by the server, with passive security under the Decisional Diffie-Hellman assumption. In this work, we strengthen the security guarantees of the NPR OPRF by protecting it against active attacks of the server. We have implemented our solution and report on the performance.
Our main result is a new batch OPRF protocol which is secure against maliciously corrupted servers, but is essentially as efficient as the semi-honest solution. More precisely, the computation (and communication) overhead is a multiplicative factor $o(1)$ as the batch size increases. The obvious solution using zero-knowledge proofs would have a constant factor overhead at best, which can be too expensive for certain deployments.
Our protocol relies on a novel version of the DDH problem, which we call the Oblivious Exponentiation Problem (OEP), and we give evidence for its hardness in the Generic Group model. We also present a variant of our maliciously secure protocol that does not rely on the OEP but nevertheless only has overhead $o(1)$ over the known semi-honest protocol. Moreover, we show that our techniques can also be used to efficiently protect threshold blind BLS signing and threshold ElGamal decryption against malicious attackers.
On Wagner's k-Tree Algorithm Over Integers
The $k$-Tree algorithm [Wagner 02] is a non-trivial algorithm for the average-case $k$-SUM problem that has found widespread use in cryptanalysis. Its input consists of $k$ lists, each containing $n$ integers from a range of size $m$. Wagner's original heuristic analysis suggested that this algorithm succeeds with constant probability if $n \approx m^{1/(\log{k}+1)}$, and that in this case it runs in time $O(kn)$. Subsequent rigorous analysis of the algorithm [Lyubashevsky 05, Shallue 08, Joux-Kippen-Loss 24] has shown that it succeeds with high probability if the input list sizes are significantly larger than this.
We present a broader rigorous analysis of the $k$-Tree algorithm, showing upper and lower bounds on its success probability and complexity for any size of the input lists. Our results confirm Wagner's heuristic conclusions, and also give meaningful bounds for a wide range of list sizes that are not covered by existing analyses. We present analytical bounds that are asymptotically tight, as well as an efficient algorithm that computes (provably correct) bounds for a wide range of concrete parameter settings. We also do the same for the $k$-Tree algorithm over $\mathbb{Z}_m$. Finally, we present experimental evaluation of the tightness of our results.
Rhombus: Fast Homomorphic Matrix-Vector Multiplication for Secure Two-Party Inference
We present $\textit{Rhombus}$, a new secure matrix-vector multiplication (MVM) protocol in the semi-honest two-party setting, which is able to be seamlessly integrated into existing privacy-preserving machine learning (PPML) frameworks and serve as the basis of secure computation in linear layers.
$\textit{Rhombus}$ adopts RLWE-based homomorphic encryption (HE) with coefficient encoding, which allows messages to be chosen from not only a field $\mathbb{F}_p$ but also a ring $\mathbb{Z}_{2^\ell}$, where the latter supports faster computation in non-linear layers. To achieve better efficiency, we develop an input-output packing technique that reduces the communication cost incurred by HE with coefficient encoding by about $21\times$, and propose a split-point picking technique that reduces the number of rotations to that sublinear in the matrix dimension. Compared to the recent protocol $\textit{HELiKs}$ by Balla and Koushanfar (CCS'23), our implementation demonstrates that $\textit{Rhombus}$ improves the whole performance of an MVM protocol by a factor of $7.4\times \sim 8\times$, and improves the end-to-end performance of secure two-party inference of ResNet50 by a factor of $4.6\times \sim 18\times$.