All papers (Page 18 of 24087 results)
Oryx: Private detection of cycles in federated graphs
This paper proposes Oryx, a system for efficiently detecting cycles in federated graphs where parts of the graph are held by different parties and are private. Cycle identification is an important building block in designing fraud detection algorithms that operate on confidential transaction data held by different financial institutions. Oryx allows detecting cycles of various length while keeping the topology of the graphs secret, and it does so efficiently. Oryx leverages the observation that financial graphs are very sparse, and uses this to achieve computational complexity that scales with the average degree of nodes in the graph rather than the maximum degree. Our implementation of Oryx running on a single 32-coreAWS ma chine (for each party) can detect all cycles of up to length 6 in under 5 hours in a financial transaction graph that consists of tens of millions of nodes and edges. While the costs are high, Oryx’s protocol parallelizes well and can use additional hardware resources. Furthermore, Oryx is, to our knowledge, the first system that can handle this task for large graphs.
A Simple Post-Quantum Oblivious Transfer Protocol from Mod-LWR
Oblivious transfer (OT) is a fundamental cryptographic protocol that plays a crucial role in secure multi-party computation (MPC). Most practical OT protocols by, e.g., Naor and Pinkas (SODA'01) or Chou and Orlandi (Latincrypt'15), are based on Diffie-Hellman (DH)-like assumptions and not post-quantum secure. In contrast, many other components of MPC protocols, including garbled circuits and secret sharings, are post-quantum secure. The reliance on non-post-quantum OT protocols presents a significant security bottleneck with the advent of quantum computing.
In this paper, we address this issue by constructing a simple, efficient OT protocol based on Saber, a Mod-LWR-based key exchange protocol. We implemented our OT protocol and conducted experiments to evaluate its performance. Our results show that our OT protocol significantly outperforms the state-of-the-art Kyber-based post-quantum OT protocol by Masny and Rindal (CCS'19) in terms of both computation and communication costs. Furthermore, the computation speed of our OT protocol is faster than the best-known DH-based OT protocol by Chou and Orlandi (Latincrypt'15), making it competitive to replace DH-based OT in the high-bandwidth network setting.
Public vs Private Blockchains lineage storage
This paper reports the experimental results related to lineage event storage via smart contracts deployed on private and public blockchain. In our experiments we measure the following three metrics: the cost to deploy the storage smart contract on the blockchain, which measures the initial expenditure, typically in gas units, required to deploy the smart contract that facilitates lineage event storage, then the time and gas costs needed to store a lineage event. We investigated both single and multi-clients scenarios. We considered the following public blockchains: Hedera, Fantom, Harmony Shard0, Polygon Amoy, Ethereum Sepolia, Optimism Sepolia, Klaytn Baobab and Arbitrum Sepolia. Furthermore, we investigate the performances of Hyperledger Besu with different consensus algorithms as private blockchains.
Time-Memory Trade-off Algorithms for Homomorphically Evaluating Look-up Table in TFHE
We propose time-memory trade-off algorithms for evaluating look-up table (LUT) in both the leveled homomorphic encryption (LHE) and fully homomorphic encryption (FHE) modes in TFHE. For an arbitrary $n$-bit Boolean function, we reduce evaluation time by a factor of $O(n)$ at the expense of an additional memory of "only" $O(2^n)$ as a trade-off: The total asymptotic memory is also $O(2^n)$, which is the same as that of prior works. Our empirical results demonstrate that a $7.8 \times$ speedup in runtime is obtained with a $3.8 \times$ increase in memory usage for 16-bit Boolean functions in the LHE mode. Additionally, in the FHE mode, we achieve reductions in both runtime and memory usage by factors of $17.9 \times$ and $2.5 \times $, respectively, for 8-bit Boolean functions. The core idea is to decompose the function $f$ into sufficiently small subfunctions and leverage the precomputed results for these subfunctions, thereby achieving significant performance improvements at the cost of additional memory.
Ringtail: Practical Two-Round Threshold Signatures from Learning with Errors
A threshold signature scheme splits the signing key among $\ell$ parties, such that any $t$-subset of parties can jointly generate signatures on a given message. Designing concretely efficient post-quantum threshold signatures is a pressing question, as evidenced by NIST's recent call.
In this work, we propose, implement, and evaluate a lattice-based threshold signature scheme, Ringtail, which is the first to achieve a combination of desirable properties:
(i) The signing protocol consists of only two rounds, where the first round is message-independent and can thus be preprocessed offline.
(ii) The scheme is concretely efficient and scalable to $t \leq 1024$ parties. For $128$-bit security and $t = 1024$ parties, we achieve $13.4$ KB signature size and $10.5$ KB of online communication.
(iii) The security is based on the standard learning with errors (LWE) assumption in the random oracle model. This improves upon the state-of-the-art (with comparable efficiency) which either has a three-round signing protocol [Eurocrypt'24] or relies on a new non-standard assumption [Crypto'24].
To substantiate the practicality of our scheme, we conduct the first WAN experiment deploying a lattice-based threshold signature, across 8 countries in 5 continents. We observe that an overwhelming majority of the end-to-end latency is consumed by network latency, underscoring the need for round-optimized schemes.
HERatio: Homomorphic Encryption of Rationals using Laurent Polynomials
In this work we present $\mathsf{HERatio}$, a homomorphic encryption scheme that builds on the scheme of Brakerski, and Fan and Vercauteren. Our scheme naturally accepts Laurent polynomials as inputs, allowing it to work with rationals via their bounded base-$b$ expansions. This eliminates the need for a specialized encoder and streamlines encryption, while maintaining comparable efficiency to BFV. To achieve this, we introduce a new variant of the Polynomial Learning With Errors (PLWE) problem which employs Laurent polynomials instead of the usual ``classic'' polynomials, and provide a reduction to the PLWE problem.
Collision-Based Attacks on Block Cipher Modes - Exploiting Collisions and Their Absence
Advanced Encryption Standard in Galois/Counter Mode (AES-GCM) is the most widely used Authenticated Encryption with Associated Data (AEAD) algorithm in the world. In this paper, we analyze the use of GCM with all the Initialization Vector (IV) constructions and lengths approved by NIST SP 800-38D when encrypting multiple plaintexts with the same key. We derive attack complexities in both ciphertext-only and known-plaintext models, with or without nonce hiding, for collision attacks compromising integrity and confidentiality. To facilitate the analysis of GCM with random IVs, we derive a new, simplified equation for near birthday collisions. Our analysis shows that GCM with random IVs provides less than 128 bits of security. When 96-bit IVs are used, as recommended by NIST, the security drops to less than 97 bits. Therefore, we strongly recommend NIST to forbid the use of GCM with 96-bit random nonces.
Legacy Encryption Downgrade Attacks against LibrePGP and CMS
This work describes vulnerabilities in the specification of AEAD modes and Key Wrap in two cryptographic message formats. Firstly, this applies to AEAD packets as introduced in the novel LibrePGP specification that is implemented by the widely used GnuPG application. Secondly, we describe vulnerabilities in the AES-based AEAD schemes as well as the Key Wrap Algorithm specified in the Cryptographic Message Syntax (CMS). These new attacks exploit the possibility to downgrade AEAD or AES Key Wrap ciphertexts to valid legacy CFB- or CBC-encrypted related ciphertexts and require that the attacker learns the content of the legacy decryption result. This can happen in two principal ways: either due to the human recipient returning the decryption output to the attacker as a quote or due to a programmatic decryption oracle in the receiving system that reveals information about the plaintext. The attacks effect the decryption of low-entropy plaintext blocks in AEAD ciphertexts and, in the case of LibrePGP, also the manipulation of existing AEAD ciphertexts. For AES Key Wrap in CMS, full key decryption is possible. Some of the attacks require multiple successful oracle queries. The attacks thus demonstrate that CCA2 security is not achieved by the LibrePGP and CMS AEAD or Key Wrap encryption in the presence of a legacy cipher mode decryption oracle.
QuickPool: Privacy-Preserving Ride-Sharing Service
Online ride-sharing services (RSS) have become very popular owing to increased awareness of environmental concerns and as a response to increased traffic congestion. To request a ride, users submit their locations and route information for ride matching to a service provider (SP), leading to possible privacy concerns caused by leakage of users' location data. We propose QuickPool, an efficient SP-aided RSS solution that can obliviously match multiple riders and drivers simultaneously, without involving any other auxiliary server. End-users, namely, riders and drivers share their route information with SP as encryptions of the ordered set of points-of-interest (PoI) of their route from their start to end locations. SP performs a zone based oblivious matching of drivers and riders, based on partial route overlap as well as proximity of start and end points. QuickPool is in the semi-honest setting, and makes use of secure multi-party computation. We provide security proof of our protocol, perform extensive testing of our implementation and show that our protocol simultaneously matches multiple drivers and riders very efficiently. We compare the performance of QuickPool with state-of-the-art works and observe a run time improvement of 1.6 - 2$\times$, and communication improvement of at least 8$\times$.
Faster Asynchronous Blockchain Consensus and MVBA
Blockchain consensus, a.k.a. BFT SMR, are protocols enabling $n$ processes to decide on an ever-growing chain. The fastest known asynchronous one is called 2-chain VABA (PODC'21 and FC'22), and is used as fallback chain in Abraxas* (CCS'23). It has a claimed $9.5\delta$ expected latency when used for a single shot instance, a.k.a. an MVBA.
We exhibit attacks breaking it. Hence, the title of the fastest asynchronous MVBA with quadratic messages complexity goes to sMVBA (CCS'22), with $10\delta$ expected latency.
Our positive contributions are two new and complementary designs.
$\bullet$ 2PAC (2-phase asynchronous consensus). It has a simpler and lighter chaining than in previous approaches. Instantiated with either quadratic or cubic phases of voting, it yields:
2PAC$^\text{lean}$: $+90\%$ throughput and $9.5\delta$ expected latency, with quadratic ($O(n^2)$) messages complexity. In both 2-chain VABA and sMVBA (as if chained, with pipelining), the quorum-certified transactions which were produced in the worst-case 1/3 of views with a slow leader were dumped, so the work was lost. The simpler design of 2PAC inserts such blocks in straight-line in the chain.
Thus, contrary to naive uncle-referencing, this comes with no computational overhead, yielding a net $+50\%$ throughput gain over chained sMVBA. Both the remaining throughput and latency ($-0.5\delta$) gains, come from the lighter interactive construction of proofs of consistency appended to proposed blocks, compared to sMVBA.
2PAC$^\text{BIG}$: the fastest asynchronous blockchain consensus with cubic ($O(n^3)$) messages complexity. Fault-free single shot MVBA runs decide in just $4\delta$, as soon as no message is delivered more than twice faster than others: GradedDAG (SRDS'23) required furthermore no messages reordering.
$\bullet$ Super Fast Pipelined Blocks. This is an upgrade of previous approaches for pipelining: in 2-chain VABA, Cordial Miners (DISC'23) and GradedDAG, a block pipelined by a leader in the middle of the view had almost twice larger latency than the non-pipelined block. Our design provides a fast path deciding the pipelined block with even smaller latency than the non-pipelined block. The fast delay is guaranteed in all executions with a fair scheduler, but remarkably, whatever the behaviors of faulty processes. Consistency is preserved by a lightweight mechanism, of one threshold signature appended per proposal.
Instantiated with the previous protocols, it yields: s2PAC$^\text{lean}$, with fast decision of pipelined blocks in $4\delta$; s2PAC$^\text{BIG}$, in $3\delta$; and sGradedDAG, in $3\delta$.
Phase Modulation Side Channels: Jittery JTAG for On-Chip Voltage Measurements
Measuring the fluctuations of the clock phase of a target was identified as a leakage source on early electromagnetic side-channel investigations. Despite this, only recently was directly measuring the clock phase (or jitter) of digital signals from a target connected to being a source of exploitable leakage. As the phase of a clock output will be related to signal propagation delay through the target, and this propagation delay is related to voltage, this means that most digital devices perform an unintended phase modulation (PM) of their internal voltage onto clock output phases.
This paper first demonstrates an unprofiled CPA attack against a Cortex-M microcontroller using the phase of a clock output, observing the signal on both optically isolated and capacitively isolated paths. The unprofiled attack takes only 2-4x more traces than an attack using a classic shunt-resistor measurement.
It is then demonstrated how the JTAG bypass mode can be used to force a clock through a digital device. This forced clock signal can then be used as a highly effective oscilloscope that is located on the target device. As the attack does not require modifications to the device (such as capacitor removal or heat spreader removal) it is difficult to detect using existing countermeasures. The example attack over JTAG uses an unprofiled CPA attack, requiring only about 5x more traces than an ideal shunt-resistor based measurement. In addition, a version of this attack using a fault correlation analysis attack is also demonstrated.
Countermeasures are discussed, and a simple resampling countermeasure is tested. All tools both offensive and defensive presented in the paper have been released under open-source licenses.
Masked Vector Sampling for HQC
Anticipating the advent of large quantum computers, NIST started a worldwide competition in 2016 aiming to define the next cryptographic standards. HQC is one of these post-quantum schemes still in contention, with three others already standardized. In 2022, Guo et al. introduced a timing attack that exploited an inconsistency in HQC rejection sampling function to recover its secret key in 866,000 calls to an oracle. The authors of HQC updated its specification by applying an algorithm to sample vectors in constant time. A masked implementation of this function was then proposed for BIKE but it is not directly applicable to HQC. In this paper we propose a masked specification-compliant version of HQC vector sampling function which relies, to our knowledge, on the first masked implementation of the Barrett reduction.
A New CRT-based Fully Homomorphic Encryption
We have proposed a novel FHE scheme that uniquely encodes the plaintext with noise in a way that prevents the increasing noise from overflowing and corrupting the plaintext. This allows users to perform computations on encrypted data smoothly. The scheme is constructed using the Chinese Remainder Theorem (CRT), supporting a predefined number of modular operations on encrypted plaintext without the need for bootstrapping.
Although FHE recently became popular after Gentry's work and various developments have occurred in the last decade, the idea of "Fully Homomorphic Encryption (FHE)" scheme was first introduced in the 1970s by Rivest. The Chinese Remainder Theorem is one of the most suitable tools for developing a FHE Scheme because it forms a ring homomorphism \( Z_{p_1} \times Z_{p_2} \times \ldots \times Z_{p_k} \cong Z_{p_1 p_2 \ldots p_k} \).
Various attempts have been made to develop a FHE using CRT, but most of them were unsuccessful, mainly due to the chosen plaintext attack (CPA).
The proposed scheme overcomes the chosen plaintext attack. The scheme also adds random errors to the message during encryption. However, these errors are added in such a way that, when homomorphic operations are performed over encrypted data, the increasing values of errors never overwrite the values of the messages, as happens in LWE-based homomorphic schemes. Therefore, one can perform a predefined number of homomorphic operations (both addition and multiplication) without worrying about the increasing values of errors.
Structural Lower Bounds on Black-Box Constructions of Pseudorandom Functions
We address the black-box complexity of constructing pseudorandom functions (PRF) from pseudorandom generators (PRG). The celebrated GGM construction of Goldreich, Goldwasser, and Micali (Crypto 1984) provides such a construction, which (even when combined with Levin's domain-extension trick) has super-logarithmic depth. Despite many years and much effort, this remains essentially the best construction we have to date. On the negative side, one step is provided by the work of Miles and Viola (TCC 2011), which shows that a black-box construction which just calls the PRG once and outputs one of its output bits, cannot be a PRF.
In this work, we make significant further progress: we rule out black-box constructions of PRF from PRG that follow certain structural constraints, but may call the PRG adaptively polynomially many times. In particular, we define ``tree constructions" which generalize the GGM structure: they apply the PRG $G$ along a tree path, but allow for different choices of functions to compute the children of a node on the tree and to compute the next node on the computation path down the tree. We prove that a tree construction of logarithmic depth cannot be a PRF (while GGM is a tree construction of super-logarithmic depth). We also show several other results and discuss the special case of one-call constructions.
Our main results in fact rule out even weak PRF constructions with one output bit. We use the oracle separation methodology introduced by Gertner, Malkin, and Reingold (FOCS 2001), and show that for any candidate black-box construction $F^G$ from $G$, there exists an oracle relative to which $G$ is a PRG, but $F^G$ is not a PRF.
A Note on Efficient Computation of the Multilinear Extension
The multilinear extension of an $m$-variate function $f : \{0,1\}^m \to \mathbb{F}$, relative to a finite field $\mathbb{F}$, is the unique multilinear polynomial $\hat{f} : \mathbb{F}^m \to \mathbb{F}$ that agrees with $f$ on inputs in $\{0,1\}^m$.
In this note we show how, given oracle access to $f : \{0,1\}^m \to \mathbb{F}$ and a point $z \in \mathbb{F}^m$, to compute $\hat{f}(z)$ using exactly $2^{m+1}$ multiplications, $2^m$ additions and $O(m)$ additional operations. The amount of space used corresponds to $O(m)$ field elements.
A Note on ``Privacy Preserving n-Party Scalar Product Protocol''
We show that the scalar product protocol [IEEE Trans. Parallel Distrib. Syst. 2023, 1060-1066] is insecure against semi-honest server attack, not as claimed. Besides, its complexity increases exponentially with the number $n$, which cannot be put into practice.
Stickel’s Protocol using Tropical Increasing Matrices
In this paper we introduce new concept of tropical increasing matrices and then prove that two tropical increasing matrices are commute. Using this property, we modified Stickel’s protocol. This idea similar to [5] where modified Stickel’s protocol using commuting matrices (Linde De La Puente Matrices).
Unforgeability of Blind Schnorr in the Limited Concurrency Setting
Blind signature schemes enable a user to obtain a digital signature on a message from a signer without revealing the message itself. Among the most fundamental examples of such a scheme is blind Schnorr, but recent results show that it does not satisfy the standard notion of security against malicious users, One-More Unforgeability (OMUF), as it is vulnerable to the ROS attack. However, blind Schnorr does satisfy the weaker notion of sequential OMUF, in which only one signing session is open at a time, in the Algebraic Group Model (AGM) + Random Oracle Model (ROM), assuming the hardness of the Discrete Logarithm (DL) problem.
This paper serves as a first step towards characterizing the security of blind Schnorr in the limited concurrency setting. Specifically, we show that blind Schnorr satisfies OMUF when at most two signing sessions can be concurrently open (in the AGM+ROM, assuming DL). Our argument suggests that it is plausible that blind Schnorr satisfies OMUF for up to polylogarithmically many concurrent signing sessions. Our security proof involves interesting techniques from linear algebra and combinatorics.
FHE-MENNs: Opportunities and Pitfalls for Accelerating Fully Homomorphic Private Inference with Multi-Exit Neural Networks
With concerns about data privacy growing in a connected world, cryptography researchers have focused on fully homomorphic encryption (FHE) for promising machine learning as a service solutions. Recent advancements have lowered the computational cost by several orders of magnitude, but the latency of fully homomorphic neural networks remains a barrier to adoption. This work proposes using multi-exit neural networks (MENNs) to accelerate the FHE inference. MENNs are network architectures that provide several exit points along the depth of the network. This approach allows users to employ results from any exit and terminate the computation early, saving both time and power. First, this work weighs the latency, communication, accuracy, and computational resource benefits of running FHE-based MENN inference. Then, we present the TorMENNt attack that can exploit the user's early termination decision to launch a concrete side-channel on MENNs. We demonstrate that the TorMENNt attack can predict the private image classification output of an image set for both FHE and plaintext threat models. We discuss possible countermeasures to mitigate the attack and examine their effectiveness. Finally, we tie the privacy risks with a cost-benefit analysis to obtain a practical roadmap for FHE-based MENN adoption.
Limits of Black-Box Anamorphic Encryption
(Receiver) Anamorphic encryption, introduced by Persiano $ \textit{et al.}$ at Eurocrypt 2022, considers the question of achieving private communication in a world where secret decryption keys are under the control of a dictator. The challenge here is to be able to establish a secret communication channel to exchange covert (i.e. anamorphic) messages on top of some already deployed public key encryption scheme.
Over the last few years several works addressed this challenge by showing new constructions, refined notions and extensions.
Most of these constructions, however, are either ad hoc, in the sense that they build upon specific properties of the underlying PKE, or impose severe restrictions on the size of the underlying anamorphic message space.
In this paper we consider the question of whether it is possible to have realizations of the primitive that are both generic and allow for large anamorphic message spaces. We give strong indications that, unfortunately, this is not the case.
Our first result shows that $ \textit{any black-box realization} $ of the primitive, i.e. any realization that accesses the underlying PKE only via oracle calls, $ \textit{must} $ have an anamorphic message space of size at most $poly(\lambda)$ ($\lambda$ security parameter).
Even worse, if one aims at stronger variants of the primitive (and, specifically, the notion of asymmetric anamorphic encryption, recently proposed by Catalano $ \textit{et al.} $) we show that such black-box realizations are plainly impossible, i.e. no matter how small the anamorphic message space is.
Finally, we show that our impossibility results are rather tight: indeed, by making more specific assumptions on the underlying PKE, it becomes possible to build generic AE where the anamorphic message space is of size $\Omega(2^\lambda)$.
The Cost of Maintaining Keys in Dynamic Groups with Applications to Multicast Encryption and Group Messaging
In this work we prove lower bounds on the (communication) cost of maintaining a shared key among a dynamic group of users.
Being "dynamic'' means one can add and remove users from the group.
This captures important protocols like multicast encryption (ME) and continuous group-key agreement (CGKA), which is the primitive underlying many group messaging applications.
We prove our bounds in a combinatorial setting where the state of the protocol progresses in rounds.
The state of the protocol in each round is captured by a set system, with each of its elements specifying a set of users who share a secret key.
We show this combinatorial model implies bounds in symbolic models for ME and CGKA that capture, as building blocks, PRGs, PRFs, dual PRFs, secret sharing, and symmetric encryption in the setting of ME, and PRGs, PRFs, dual PRFs, secret sharing, public-key encryption, and key-updatable public-key encryption in the setting of CGKA.
The models are related to the ones used by Micciancio and Panjwani (Eurocrypt'04) and Bienstock et al. (TCC'20) to analyze ME and CGKA, respectively.
We prove - using the Bollobás' Set Pairs Inequality - that the cost (number of uploaded ciphertexts) for replacing a set of $d$ users in a group of size $n$ is $\Omega(d\ln(n/d))$.
Our lower bound is asymptotically tight and both improves on a bound of $\Omega(d)$ by Bienstock et al. (TCC'20), and generalizes a result by Micciancio and Panjwani (Eurocrypt'04), who proved a lower bound of $\Omega(\log(n))$ for $d=1$.
Post-Quantum Ready Key Agreement for Aviation
Transitioning from classically to quantum secure key agreement protocols may require to exchange fundamental components, for example, exchanging Diffie-Hellman-like key exchange with a key encapsulation mechanism (KEM). Accordingly, the corresponding security proof can no longer rely on the Diffie-Hellman assumption, thus invalidating the security guarantees. As a consequence, the security properties have to be re-proven under a KEM-based security notion.
We initiate the study of the LDACS key agreement protocol (Edition 01.01.00 from 25.04.2023), which is soon-to-be-standardized by the International Civil Aviation Organization. The protocol's cipher suite features Diffie-Hellman as well as a KEM-based key agreement protocol to provide post-quantum security. While the former results in an instantiation of an ISO key agreement inheriting all security properties, the security achieved by the latter is ambiguous.
We formalize the computational security using the systematic notions of de Saint Guilhem, Fischlin and Warinshi (CSF '20), and prove the exact security that the KEM-based variant achieves in this model; primarily entity authentication, key secrecy and key authentication. To further strengthen our ``pen-and-paper'' findings, we model the protocol and its security guarantees using Tamarin, providing an automated proof of the security against a Dolev-Yao attacker.
Lower Bound on Number of Compression Calls of a Collision-Resistance Preserving Hash
The collision-resistant hash function is an early cryptographic primitive
that finds extensive use in various applications. Remarkably, the Merkle-Damgård
and Merkle tree hash structures possess the collision-resistance preserving property,
meaning the hash function remains collision-resistant when the underlying compression function is collision-resistant. This raises the intriguing question of whether reducing the number of underlying compression function calls with the collision-resistance preserving property is possible. In pursuit of addressing these inquiries, we prove that for an ℓn-to-sn-bit collision-resistance preserving hash function designed using r tn-to-n-bit compression function calls, we must have r ≥ ⌈(ℓ−s)/(t−1)⌉. Throughout the paper, all operations other than the compression function are assumed to be linear (which we call linear hash mode).
Notes on Multiplying Cyclotomic Polynomials on a GPU
Uncategorized
Uncategorized
Lattice cryptography has many exciting applications, from homomorphic encryption to zero knowledge proofs. We explore the algebra of cyclotomic polynomials underlying many practical lattice cryptography constructions, and we explore algorithms for multiplying cyclotomic polynomials on a GPU.
Faster Lookup Table Evaluation with Application to Secure LLM Inference
As large language models (LLMs) continue to gain popularity, concerns about user privacy are amplified, given that the data submitted by users for inference may contain sensitive information. Therefore, running LLMs through secure two-party computation (a.k.a. secure LLM inference) has emerged as a prominent topic. However, many operations in LLMs, such as Softmax and GELU, cannot be computed using conventional gates in secure computation; instead, lookup tables (LUTs) have to be utilized, which makes LUT to be an essential primitive in secure LLM inference.
In this paper, we propose $\mathsf{ROTL}$, a secure two-party protocol for LUT evaluations. Compared with FLUTE (the state-of-the-art LUT presented at Oakland '23), it achieves upto 11.6$\times$ speedup in terms of overall performance and 155$\times$ speedup in terms of online performance. Furthermore, $\mathsf{ROTL}$ can support arithmetic shares (which is required by secure LLM inference), whereas FLUTE can only support boolean shares. At the heart of $\mathsf{ROTL}$ is a novel protocol for secret-shared rotation, which allows two parties to generate additive shares of the rotated table without revealing the rotation offset. We believe this protocol is of independent interest. Based on $\mathsf{ROTL}$, we design a novel secure comparison protocol; compared with the state-of-the-art, it achieves a 2.4$\times$ bandwidth reduction in terms of online performance.
To support boolean shares, we further provide an optimization for FLUTE, by reducing its computational complexity from $O(l\cdot n^2)$ to $O(n\log n+l\cdot n)$ and shifting $O(n\log n)$ computation to the preprocessing phase. As a result, compared with FLUTE, it achieves upto 10.8$\times$ speedup in terms of overall performance and 962$\times$ speedup in terms of online performance.
Fusion Channel Attack with POI Learning Encoder
In order to challenge the security of cryptographic systems, Side-Channel Attacks exploit data leaks such as power consumption and electromagnetic emissions. Classic Side-Channel Attacks, which mainly focus on mono-channel data, fail to utilize the joint information of multi-channel data. However, previous studies of multi-channel attacks have often been limited in how they process and adapt to dynamic data. Furthermore, the different data types from various channels make it difficult to use them effectively. This study introduces the Fusion Channel Attack with POI Learning Encoder (FCA), which employs a set of POI Learning encoders that learn the inverse base transformation function family and project the data of each channel into a unified fusion latent space. Furthermore, our method introduces an optimal transport theory based metric for evaluating feature space fusion, which is used to assess the differences in feature spaces between channels. This model not only enhances the ability to process and interpret multi-source data, but also significantly improves the accuracy and applicability of SCAs in different environments.
MatcHEd: Privacy-Preserving Set Similarity based on MinHash
Fully homomorphic encryption (FHE) enables arbitrary computation on encrypted data, but certain applications remain prohibitively expensive in the encrypted domain. As a case in point, comparing two encrypted sets of data is extremely computationally expensive due to the large number of comparison operators required. In this work, we propose a novel methodology for encrypted set similarity inspired by the MinHash algorithm and the CGGI FHE scheme. Doing comparisons in FHE requires comparators and multiplexers or an expensive approximation, which further increases the latency, especially when the goal is to compare two sets of data. The MinHash algorithm can significantly reduce the number of comparisons required by employing a special Carter-Wegman (CW) hash function as a key building block. However, the modulus operation in the CW hash becomes another key bottleneck because the encrypted sub-circuits required to perform the modular reduction are very large and inefficient in an FHE setting. Towards that end, we introduce an efficient bitwise FHE-friendly digest function (FFD) to employ as the cornerstone of our proposed encrypted set-similarity. In a Boolean FHE scheme like CGGI, the bitwise operations can be implemented efficiently with Boolean gates, which allows for faster evaluation times relative to standard Carter-Wegman constructions. Overall, our approach drastically reduces the number of comparisons required relative to the baseline approach of directly computing the Jaccard similarity coefficients, and is inherently parallelizable, allowing for efficient encrypted computation on multi-CPU and GPU-based cloud servers. We validate our approach by performing a privacy-preserving plagiarism detection across encrypted documents.
PolyFHEmus: Rethinking Multiplication in Fully Homomorphic Encryption
Homomorphic encryption is a powerful technology that solves key privacy concerns in cloud computing by enabling computation on encrypted data. However, it has not seen widespread adoption due to prohibitively high latencies. In this article, we identify polynomial multiplication as a bottleneck and investigate alternative algorithms to accelerate encrypted computing.
Juliet: A Configurable Processor for Computing on Encrypted Data
Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU evaluation. To improve usability, we define an expressive assembly language and instruction set architecture (ISA) judiciously designed for end-to-end encrypted computation. We demonstrate Juliet's capabilities with a broad range of realistic benchmarks including cryptographic algorithms, such as the lightweight ciphers Simon and Speck, as well as logistic regression (LR) inference and matrix multiplication.
HElix: Genome Similarity Detection in the Encrypted Domain
As the field of genomics continues to expand and more sequencing data is gathered, genome analysis becomes increasingly relevant for many users. For example, a common scenario entails users trying to determine if their DNA samples are similar to DNA sequences hosted in a larger remote repository. Nevertheless, end users may be reluctant to upload their DNA sequences, while the owners of remote genomics repositories are unwilling to openly share their database. To address this challenge, we propose two distinct approaches based on fully homomorphic encryption to preserve the privacy of the genomic data and enable queries directly on ciphertexts. The first is based on the ubiquitous MinHash algorithm and can determine if similar matches exist in the database, while the second involves a bespoke bloom filter construction for determining exact matches. We validate both approaches across various database sizes using both GPU and CPU-based cloud servers.
Tyche: Probabilistic Selection over Encrypted Data for Generative Language Models
Generative AI, a significant technological disruptor in recent years, has impacted domains like augmented reality, coding assistance, and text generation. However, use of these models requires users to trust the model owners with their sensitive data given as input to the model. Fully Homomorphic Encryption (FHE) offers a promising solution, and many earlier works have investigated the use this technology for machine learning as a service (MLaaS) applications. Still, these efforts do not cater to generative models that operate probabilistically, allowing for diverse and creative outputs. In this work, we introduce three novel probabilistic selection algorithms for autoregressive generative AI: multiplication-scaled cumulative sum, heuristic cumulative sum, and the random-multiplication argmax. Each of these approaches presents distinctive challenges in optimizing the trade-off between precision and timing performance, a balance intricately tied to the specific characteristics of the data under consideration. Our results show that the random multiplication argmax-based method is more scalable than the cumulative sum methods and can accurately mimic the plaintext selection curve.
Obfuscated Key Exchange
Censorship circumvention tools enable clients to access endpoints in a network despite the presence of a censor. Censors use a variety of techniques to identify content they wish to block, including filtering traffic patterns that are characteristic of proxy or circumvention protocols and actively probing potential proxy servers. Circumvention practitioners have developed fully encrypted protocols (FEPs), intended to have traffic that appears indistinguishable from random. A FEP is typically composed of a key exchange protocol to establish shared secret keys, and then a secure channel protocol to encrypt application data; both must avoid revealing to observers that an obfuscated protocol is in use.
We formalize the notion of obfuscated key exchange, capturing the requirement that a key exchange protocol's traffic "looks random" and that it resists active probing attacks, in addition to ensuring secure session keys and authentication. We show that the Tor network's obfs4 protocol satisfies this definition. We then show how to extend the obfs4 design to defend against stronger censorship attacks and present a quantum-safe obfuscated key exchange protocol. To instantiate our quantum-safe protocol using the ML-KEM (Kyber) standard, we present Kemeleon, a new mapping between ML-KEM public keys/ciphertexts and uniform byte strings.
Randomized Distributed Function Computation with Semantic Communications: Applications to Privacy
Randomized distributed function computation refers to remote function computation where transmitters send data to receivers which compute function outputs that are randomized functions of the inputs. We study the applications of semantic communications in randomized distributed function computation to illustrate significant reductions in the communication load, with a particular focus on privacy. The semantic communication framework leverages generalized remote source coding methods, where the remote source is a randomized version of the observed data. Since satisfying security and privacy constraints generally require a randomization step, semantic communication methods can be applied to such function computation problems, where the goal is to remotely simulate a sequence at the receiver such that the transmitter and receiver sequences follow a target probability distribution. Our performance metrics guarantee (local differential) privacy for each input sequence, used in two different distributed function computation problems, which is possible by using strong coordination methods.
This work provides lower bounds on Wyner's common information (WCI), which is one of the two corner points of the coordination-randomness rate region characterizing the ultimate limits of randomized distributed function computation. The WCI corresponds to the case when there is no common randomness shared by the transmitter and receiver. Moreover, numerical methods are proposed to compute the other corner point for continuous-valued random variables, for which an unlimited amount of common randomness is available. Results for two problems of practical interest illustrate that leveraging common randomness can decrease the communication load as compared to the WCI corner point significantly. We also illustrate that semantic communication gains over lossless compression methods are achieved also without common randomness, motivating further research on limited common randomness scenarios.
Enabling Complete Atomicity for Cross-chain Applications Through Layered State Commitments
Cross-chain Decentralized Applications (dApps) are increasingly popular for their ability to handle complex tasks across various blockchains, extending beyond simple asset transfers or swaps. However, ensuring all dependent transactions execute correctly together, known as complete atomicity, remains a challenge. Existing works provide financial atomicity, protecting against monetary loss, but lack the ability to ensure correctness for complex tasks. In this paper, we introduce Avalon, a transaction execution framework for cross-chain dApps that guarantees complete atomicity for the first time. Avalon achieves this by introducing multiple state layers above the native one to cache state transitions, allowing for efficient management of these state transitions. Most notably, for concurrent cross-chain transactions, Avalon resolves not only intra-chain conflicts but also addresses potential inconsistencies between blockchains via a novel state synchronization protocol, enabling serializable cross-chain execution. We implement Avalon using smart contracts in Cosmos ecosystem and evaluate its commitment performance, demonstrating acceptable latency and gas consumption even under conflict cases.
LEA Block Cipher in Rust Language: Trade-off between Memory Safety and Performance
Cryptography implementations of block cipher have been written in C language due to its strong features on system-friendly features. However, the C language is prone to memory safety issues, such as buffer overflows and memory leaks. On the other hand, Rust, novel system programming language, provides strict compile-time memory safety guarantees through its ownership model. This paper presents the implementation of LEA block cipher in Rust language, demonstrating features to prevent common memory vulnerabilities while maintaining performance. We compare the Rust implementation with the traditional C language version, showing that while Rust incurs a reasonable memory overhead, it achieves comparable the execution timing of encryption and decryption. Our results highlight Rust’s suitability for secure cryptographic applications, striking the balance between memory safety and execution efficiency.
Quantum Implementation of LSH
As quantum computing progresses, the assessment of cryptographic algorithm resilience against quantum attack gains significance interests in the field of cryptanalysis. Consequently, this paper implements the depth-optimized quantum circuit of Korean hash function (i.e., LSH) and estimates its quantum attack cost in quantum circuits. By utilizing an optimized quantum adder and employing parallelization techniques, the proposed quantum circuit achieves a 78.8\% improvement in full depth and a 79.1\% improvement in Toffoli depth compared to previous the-state-of art works.
In conclusion, based on the implemented quantum circuit, we estimate the resources required for a Grover collision attack and evaluate the post-quantum security of LSH algorithms.
Practical Non-interactive Multi-signatures, and a Multi-to-Aggregate Signatures Compiler
In a fully non-interactive multi-signature, resp. aggregate-signature scheme (fNIM, resp. fNIA), signatures issued by many signers on the same message, resp. on different messages, can be succinctly ``combined'', resp. ``aggregated''.
fNIMs are used in the Ethereum consensus protocol, to produce the certificates of validity of blocks which are to be verified by billions of clients. fNIAs are used in some PBFT-like consensus protocols, such as the production version of Diem by Aptos, to replace the forwarding of many signatures by a new leader. In this work we address three complexity bottlenecks.
(i) fNIAs are costlier than fNIMs, e.g., we observe that verification time of a 3000-wise aggregate signature of BGLS (Eurocrypt'03), takes 300x longer verification time than verification of a 3000-wise pairing-based multisignature.
(ii) fNIMs impose that each verifier processes the setup published by the group of potential signers. This processing consists either in verifying proofs of possession (PoPs), such as in Pixel (Usenix'20) and in the IETF'22 draft inherited from Ristenpart-Yilek (Eurocrypt'07), which costs a product of pairings over all published keys. Or, it consists in re-randomizing the keys, such as in SMSKR (FC'24).
(iii) Existing proven security bounds on efficient fNIMs do not give any guarantee in practical curves with 256bits-large groups, such as BLS12-381 (used in Ethereum) or BLS12-377 (used in Zexe). Thus, computing in much larger curves is required to have provable guarantees.
Our first contribution is a new fNIM called $\mathsf{dms}$, it addresses both (ii) and (iii).
It is as simple as adding Schnorr PoPs to the schoolbook pairing-based fNIM of Boldyreva (PKC'03).
(ii) For a group of 1000 signers, verification of these PoPs is: $5+$ times faster than for the previous pairing-based PoPs; and $3+$ times faster than the Verifier's processing of the setup in SMSKR (and contrary to the latter, needs not be re-started when a new member joins the group).
(iii) We prove a tight reduction to the discrete logarithm (DL), in the algebraic group model (AGM). Given the current estimation of roughly 128 bits of security for the DL in both the curves BLS12-381 and BLS12-377, we deduce a probability of forgery of $\mathsf{dms}$ no higher than about $2^{-93}$ for a time $2^{80}$ adversary.
This reduction is our main technical contribution. The only related proof before was for an interactive Schnorr-based multi-signature scheme, using Schnorr PoPs. Our approach easily fills a gap in this proof, since we take into account that the adversary has access to a signing oracle even before publishing its PoPs. But in our context of pairing-based multi-signatures, extraction of the keys of the adversary is significantly more complicated, since the signing oracle produces a correlated random string.
We finally provide another application of $\mathsf{dms}$, which is that it can be plugged in recent threshold signatures without setup (presented by Das et al at CCS'23, and Garg et al at SP'24), since these schemes implicitly build on any arbitrary BLS-based fNIM.
Our second contribution addresses (i), it is a very simple compiler: $\mathcal{M}to\mathcal{A}$ (multi-to-aggregate). It turns any fNIM into an fNIA, suitable for aggregation of signatures on messages with a prefix in common, with the restriction that a signer must not sign twice using the same prefix. The resulting fNIA is post-quantum secure as soon as the fNIM is, such as Chipmunk (CCS'23). We demonstrate the relevance for Diem by applying $\mathcal{M}to\mathcal{A}$ to $\mathsf{dms}$: the resulting fNIA enables to verify 39x faster an aggregate of 129 signatures, over messages with $7$ bits-long variable parts, than BGLS.
Separating Selective Opening Security From Standard Security, Assuming IO
Assuming the hardness of LWE and the existence of IO, we construct a public-key encryption scheme that is IND-CCA secure but fails to satisfy even a weak notion of indistinguishability security with respect to selective opening attacks. Prior to our work, such a separation was known only from stronger assumptions such as differing inputs obfuscation (Hofheinz, Rao, and Wichs, PKC 2016).
Central to our separation is a new hash family, which may be of independent interest. Specifically, for any $S(k) = k^{O(1)}$, any $n(k) = k^{O(1)}$, and any $m(k) = k^{\Theta(1)}$, we construct a hash family mapping $n(k)$ bits to $m(k)$ bits that is somewhere statistically correlation intractable (SS-CI) for all relations $R_k \subseteq \{0,1\}^{n(k)} \times \{0,1\}^{m(k)}$ that are enumerable by circuits of size $S(k)$.
We give two constructions of such a hash family. Our first construction uses IO, and generically ``boosts'' any hash family that is SS-CI for the smaller class of functions that are computable by circuits of size $S(k)$. This weaker hash variant can be constructed based solely on LWE (Peikert and Shiehian, CRYPTO 2019). Our second construction is based on the existence of a circular secure FHE scheme, and follows the construction of Canetti et al. (STOC 2019).
QuietOT: Lightweight Oblivious Transfer with a Public-Key Setup
Oblivious Transfer (OT) is at the heart of secure computation and is a foundation for many applications in cryptography. Over two decades of work have led to extremely efficient protocols for evaluating OT instances in the preprocessing model, through a paradigm called OT extension.
A few OT instances generated in an offline phase can be used to perform many OTs in an online phase efficiently, i.e., with very low communication and computational overheads.
Specifically, traditional OT extension protocols use a small number of “base” OTs, generated using any black-box OT protocol, and convert them into many OT instances using only lightweight symmetric-key primitives.
Recently, a new paradigm of OT with a *public-key setup* has emerged, which replaces the base OTs with a non-interactive setup: Using only the public key of the other party, two parties can efficiently compute a virtually unbounded number of OT instances on-the-fly.
In this paper, we put forth a novel framework for OT extension with a public-key setup and concretely efficient instantiations. An implementation of our framework is 30-100 times faster when compared to the previous state-of-the-art public-key OT protocols, and remains competitive even when compared to OT protocols that *do not* offer a public-key setup. Additionally, our instantiations result in the first public-key schemes with plausible post-quantum security.
In summary, this paper contributes:
- QuietOT: A framework for OT extension with a public-key setup that uses fast, symmetric-key primitives to generate OT instances following a one-time public-key setup, and offering additional features such as precomputability.
- A public-key setup for QuietOT from the RingLWE assumption, resulting in the first post-quantum construction of OT extension with a public-key setup.
- An optimized, open-source implementation of our construction that can generate up to 1M OT extensions per second on commodity hardware. In contrast, the state-of-the-art public-key OT protocol is limited to approximately 20K OTs per second.
- The first formal treatment of the security of OT with a public-key setup in a multi-party setting, which addresses several subtleties that were overlooked in prior work.
GAuV: A Graph-Based Automated Verification Framework for Perfect Semi-Honest Security of Multiparty Computation Protocols
Proving the security of a Multiparty Computation (MPC) protocol is a difficult task. Under the current simulation-based definition of MPC, a security proof consists of a simulator, which is usually specific to the concrete protocol and requires to be manually constructed, together with a theoretical analysis of the output distribution of the simulator and corrupted parties' views in the real world. This presents an obstacle in verifying the security of a given MPC protocol. Moreover, an instance of a secure MPC protocol can easily lose its security guarantee due to careless implementation, and such a security issue is hard to detect in practice.
In this work, we propose a general automated framework to verify the perfect security of instances of MPC protocols against the semi-honest adversary. Our framework has perfect soundness: any protocol that is proven secure under our framework is also secure under the simulation-based definition of MPC. We demonstrate the completeness of our framework by showing that for any instance of the well-known BGW protocol, our framework can prove its security for every corrupted party set with polynomial time. Unlike prior work that only focuses on black-box privacy which requires the outputs of corrupted parties to contain no information about the inputs of the honest parties, our framework may potentially be used to prove the security of arbitrary MPC protocols.
We implement our framework as a prototype. The evaluation shows that our prototype automatically proves the perfect semi-honest security of BGW protocols and B2A (binary to arithmetic) conversion protocols in reasonable durations.
Securely Training Decision Trees Efficiently
Decision trees are an important class of supervised learning algorithms. When multiple entities contribute data to train a decision tree (e.g. for fraud detection in the financial sector), data privacy concerns necessitate the use of a privacy-enhancing technology such as secure multi-party computation (MPC) in order to secure the underlying training data. Prior state-of-the-art (Hamada et al.) construct an MPC protocol for decision tree training with a communication of $\mathcal{O}(hmN\log N)$, when building a decision tree of height $h$ for a training dataset of $N$ samples, each having $m$ attributes.
In this work, we significantly reduce the communication complexity of secure decision tree training.
We construct a protocol with communication complexity $\mathcal{O}(mN\log N + hmN + hN\log N)$, thereby achieving an improvement of $\approx \mathsf{min}(h, m, \log N)$ over Hamada et al.
At the core of our technique is an improved protocol to regroup sorted private elements further into additional groups (according to a flag vector) while maintaining their relative ordering. We implement our protocol in the MP-SPDZ framework and show that it requires $10\times$ lesser communication and is $9\times$ faster than the state-of-the-art.
A More Compact AES, and More
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal resources, or potentially in a bit-sliced software implementation to increase speed.
TaSSLE: Lasso for the commitment-phobic
We present TaSSLE, a new lookup argument for decomposable tables with minimal commitment costs. The construction generalizes techniques introduced in Lasso (Eurocrypt '24) which take advantage of the internal structure present in such tables to avoid the need for any party to need to commit to, or even construct, the entire table. This allows the use of lookups against very large tables, with applications including new design strategies for "zero-knowledge virtual machines". We show that these techniques may be combined in a generic way with any existing lookup argument to achieve similar results. We then give a construction of TaSSLE by applying this observation to a recent lookup argument, introduced in [Papini-Haböck '23], which combines logarithmic derivatives with the GKR protocol to achieve a lookup argument with minimal commitment costs.
Trust Nobody: Privacy-Preserving Proofs for Edited Photos with Your Laptop
The Internet has plenty of images that are transformations (e.g., resize, blur) of confidential original images. Several scenarios (e.g., selling images over the Internet, fighting disinformation, detecting deep fakes) would highly benefit from systems allowing to verify that an image is the result of a transformation applied to a confidential authentic image. In this paper, we focus on systems for proving and verifying the correctness of transformations of authentic images guaranteeing: 1) confidentiality (i.e., the original image remains private), 2) efficient proof generation (i.e., the proof certifying the correctness of the transformation can be computed with a common laptop) even for high-resolution images, 3) authenticity (i.e., only the advertised transformations have been applied) and 4) fast detection of fraud proofs. Our contribution consists of the following results:
- We present new definitions following in part the ones proposed by Naveh and Tromer [IEEE S&P 2016] and strengthening them to face more realistic adversaries.
- We propose techniques leveraging the way typical transformations work to then efficiently instantiate ZK-snarks circumventing the major bottlenecks due to claims about large pre-images of cryptographic hashes.
- We present a 1st construction based on an ad-hoc signature scheme and an and-hoc cryptographic hash function, obtaining for the first time all the above 4 properties.
- We present a 2nd construction that, unlike in previous results, works with the signature scheme and cryptographic hash function included in the C2PA specifications.
Experimental results confirm the viability of our approach: in our 1st construction, an authentic transformation (e.g., a resize or a crop) of a high-resolution image of 30 MP can be generated on a common 8 cores PC in about 41 minutes employing less than 4 GB of RAM. Our 2nd construction is roughly one order of magnitude slower than our 1st construction. Prior results instead either require expensive computing resources or provide unsatisfying confidentiality.
Message Latency in Waku Relay with Rate Limiting Nullifiers
Waku is a privacy-preserving, generalized, and decentralized messaging protocol suite. Waku uses GossipSub for message routing and Rate Limiting Nullifiers (RLN) for spam protection. GossipSub ensures fast and reliable peer-to-peer message delivery in a permissionless environment, while RLN enforces a common publishing rate limit using zero-knowledge proofs.
This paper presents a practical evaluation of message propagation latency in Waku. First, we estimate latencies analytically, building a simple mathematical model for latency under varying conditions. Second, we run a large-scale single-host simulation with 1000 nodes. Third, we set up a multi-host Waku deployment using five nodes in different locations across the world. Finally, we compare our analytical estimations to the results of the simulation and the real-world measurement.
The experimental results are in line with our theoretical model. Under realistic assumptions, medium-sized messages (25 KB) are delivered within 1 second. We conclude that Waku can achieve satisfactory latency for typical use cases, such as decentralized messengers, while providing scalability and anonymity.
A Study of Partial Non-Linear Layers with DEFAULT and BAKSHEESH
In this work, we take a look at the two recently proposed block ciphers, DEFAULT and BAKSHEESH, both of which are descendent of another block cipher named GIFT. We show that both ciphers can be interpreted within the partial non-linear layer category, thanks to the SBoxes having at least one non-trivial linear structure. We also reevaluate the security claim of DEFAULT.
On the efficient representation of isogenies (a survey)
We survey different (efficient or not) representations of isogenies, with a particular focus on the recent "higher dimensional" isogeny representation, and algorithms to manipulate them.
Protecting Cryptographic Code Against Spectre-RSB
It is fundamental that executing cryptographic software must not leak secrets through side-channels. For software-visible side-channels, it was long believed that "constant-time" programming would be sufficient as a systematic countermeasure. However, this belief was shattered in 2018 by attacks exploiting speculative execution—so called Spectre attacks.
Recent work shows that language support suffices to protect cryptographic code with minimal overhead against one class of such attacks, Spectre v1, but leaves an open question of whether this result can be extended to also cover other classes of Spectre attacks.
In this paper, we answer this question in the affirmative: We design, validate, implement, and verify an approach to protect cryptographic implementations against all known classes of Spectre attacks—the main challenge in this endeavor is attacks exploiting the return stack buffer, which are known as Spectre-RSB. Our approach combines a new value-dependent information-flow type system that enforces speculative constant-time in an idealized model of transient execution and a compiler transformation that realizes this idealized model on the generated low-level code. Using the Coq proof assistant, we prove that the type system is sound with respect to the idealized semantics and that the compiler transformation preserves speculative constant-time.
We implement our approach in the Jasmin framework for high-assurance cryptography and demonstrate that the overhead incurred by full Spectre protections is below 2% for most cryptographic primitives and reaches only about 5–7% for the more complex post-quantum key-encapsulation mechanism Kyber.
Strong Existential Unforgeability and BUFF Securities of MPC-in-the-Head Signatures
NIST started the standardization of additional post-quantum signatures in 2022. Among 40 candidates, a few showed stronger security than existential unforgeability, strong existential unforgeability, and BUFF (beyond unforgeability features) securities. Recently, Aulbach, Düzlü, Meyer, Struck, and Weishäupl (PQCrypto 2024) examined the BUFF securities of 17 out of 40 candidates. Unfortunately, on the so-called MPC-in-the-Head (MPCitH) signature schemes, we have no knowledge of strong existential unforgeability and BUFF securities.
This paper studies the strong securities of all nine MPCitH signature candidates: AIMer, Biscuit, FAEST, MIRA, MiRitH, MQOM, PERK, RYDE, and SDitH.
We show that the MPCitH signature schemes are strongly existentially unforgeable under chosen message attacks in the (quantum) random oracle model. To do so, we introduce a new property of the underlying multi-pass identification, which we call _non-divergency_. This property can be considered as a weakened version of the computational unique response for three-pass identification defined by Kiltz, Lyubashevsky, and Schaffner (EUROCRYPT 2018) and its extension to multi-pass identification defined by Don, Fehr, and Majenz (CRYPTO 2020). In addition, we show that the SSH11 protocol proposed by Sakumoto, Shirai, and Hiwatari (CRYPTO 2011) is _not_ computational unique response, while Don et al. (CRYPTO 2020) claimed it.
We also survey BUFF securities of the nine MPCitH candidates in the quantum random oracle model. In particular, we show that Biscuit and MiRitH do not have some of the BUFF securities.
From Interaction to Independence: zkSNARKs for Transparent and Non-Interactive Remote Attestation
Remote attestation (RA) protocols have been widely
used to evaluate the integrity of software on remote devices.
Currently, the state-of-the-art RA protocols lack a crucial feature: transparency. This means that the details of the final
attestation verification are not openly accessible or verifiable by
the public. Furthermore, the interactivity of these protocols often
limits attestation to trusted parties who possess privileged access
to confidential device data, such as pre-shared keys and initial
measurements. These constraints impede the widespread adoption
of these protocols in various applications.
In this paper, we introduce zRA, a non-interactive, transparent, and publicly provable RA protocol based on zkSNARKs.
zRA enables verification of device attestations without the need
for pre-shared keys or access to confidential data, ensuring a
trustless and open attestation process. This eliminates the reliance
on online services or secure storage on the verifier side. Moreover,
zRA does not impose any additional security assumptions beyond
the fundamental cryptographic schemes and the essential trust
anchor components on the prover side (i.e., ROM and MPU).
The zero-knowledge attestation proofs generated by devices have
constant size regardless of the network complexity and number
of attestations. Moreover, these proofs do not reveal sensitive
information regarding internal states of the device, allowing verification by anyone in a public and auditable manner. We conduct
an extensive security analysis and demonstrate scalability of zRA
compared to prior work. Our analysis suggests that zRA excels
especially in peer-to-peer and Pub/Sub network structures. To
validate the practicality, we implement an open-source prototype
of zRA using the Circom language. We show that zRA can be
securely deployed on public permissionless blockchains, serving
as an archival platform for attestation data to achieve resilience
against DoS attacks.
Efficient Lattice-Based Threshold Signatures with Functional Interchangeability
A threshold signature scheme distributes the ability to generate signatures through distributed key generation and signing protocols. A threshold signature scheme should be functionally interchangeable, meaning that a signature produced by a threshold scheme should be verifiable by the same algorithm used for non-threshold signatures. To resist future attacks from quantum adversaries, lattice-based threshold signatures are desirable. However, the performance of existing lattice-based threshold signing protocols is still far from practical.
This paper presents the first lattice-based $t$-out-of-$n$ threshold signature scheme with functional interchangeability that has been implemented. To build an $t$-out-of-$n$ access structure for arbitrary $t \leq n$, we first present a novel $t$-out-of-$n$ version of the SPDZ MPC protocol. We avoid using the MPC protocol to evaluate hash operations for high concrete efficiency. Moreover, we design an efficient distributed rejection sampling protocol. Consequently, the online phase of our distributed signing protocol takes only 0.5 seconds in the two-party setting and 7.3 seconds in the 12-party setting according to our implementation. As a byproduct, our scheme also presents a periodic key refreshment mechanism and offers proactive security.
VerITAS: Verifying Image Transformations at Scale
Verifying image provenance has become an important topic, especially in the realm of news media. To address this issue, the Coalition for Content Provenance and Authenticity (C2PA) developed a standard to verify image provenance that relies on digital signatures produced by cameras. However, photos are usually edited before being published, and a signature on an original photo cannot be verified given only the published edited image. In this work, we describe VerITAS, a system that uses zero-knowledge proofs (zk-SNARKs) to prove that only certain edits have been applied to a signed photo. While past work has created image editing proofs for photos, VerITAS is the first to do so for realistically large images (30 megapixels). Our key innovation enabling this leap is the design of a new proof system that enables proving knowledge of a valid signature on a large amount of witness data. We run experiments on realistically large images that are more than an order of magnitude larger than those tested in prior work. In the case of a computationally weak signer, such as a camera, we are able to generate a proof of valid edits for a 90 MB image in just over thirteen minutes, costing about $0.54 on AWS per image. In the case of a more powerful signer, we are able to generate a proof of valid edits for a 90 MB image in just over three minutes, costing only \$0.13 on AWS per image. Either way, proof verification time is less than a second. Our techniques apply broadly whenever there is a need to prove that an efficient transformation was applied correctly to a large amount of signed private data.
AITIA: Efficient Secure Computation of Bivariate Causal Discovery
Researchers across various fields seek to understand causal relationships but often find controlled experiments impractical. To address this, statistical tools for causal discovery from naturally observed data have become crucial. Non-linear regression models, such as Gaussian process regression, are commonly used in causal inference but have limitations due to high costs when adapted for secure computation. Support vector regression (SVR) offers an alternative but remains costly in an Multi-party computation context due to conditional branches and support vector updates.
In this paper, we propose Aitia, the first two-party secure computation protocol for bivariate causal discovery. The protocol is based on optimized multi-party computation design choices and is secure in the semi-honest setting. At the core of our approach is BSGD-SVR, a new non-linear regression algorithm designed for MPC applications, achieving both high accuracy and low computation and communication costs. Specifically, we reduce the training complexity of the non-linear regression model from approximately from $\mathcal{O}(N^3)$ to $\mathcal{O}(N^2)$ where $N$ is the number of training samples.
We implement Aitia using CrypTen and assess its performance across various datasets. Empirical evaluations show a significant speedup of $3.6\times$ to $340\times$ compared to the baseline approach.
ArcEDB: An Arbitrary-Precision Encrypted Database via (Amortized) Modular Homomorphic Encryption
Fully homomorphic encryption (FHE) based database outsourcing is drawing growing research interests. At its current state, there exist two primary obstacles against FHE-based encrypted databases (EDBs): i) low data precision, and ii) high computational latency. To tackle the precision-performance dilemma, we introduce ArcEDB, a novel FHE-based SQL evaluation infrastructure that simultaneously achieves high data precision and fast query evaluation. Based on a set of new plaintext encoding schemes, we are able to execute arbitrary-precision ciphertext-to-ciphertext homomorphic comparison orders of magnitude faster than existing methods. Meanwhile, we propose efficient conversion algorithms between the encoding schemes to support highly composite SQL statements, including advanced filter-aggregation and multi-column synchronized sorting. We perform comprehensive experiments to study the performance characteristics of ArcEDB. In particular, we show that ArcEDB can be up to $57\times$ faster in homomorphic filtering and up to $20\times$ faster over end-to-end SQL queries when compared to the state-of-the-art FHE-based EDB solutions. Using ArcEDB, a SQL query over a 10K-row time-series EDB with 64-bit timestamps only runs for under one minute.
VIMz: Private Proofs of Image Manipulation using Folding-based zkSNARKs
Ensuring the authenticity and credibility of daily media on internet is an ongoing problem. Meanwhile, genuinely captured images often require refinements before publication. Zero-knowledge proofs (ZKPs) offer a solution by verifying edited image without disclosing the original source. However, ZKPs typically come with high costs, particularly in terms of prover complexity and proof size. This paper presents VIMz, a framework for efficiently proving the authenticity of high-resolution images using folding-based zkSNARKs; a type of proving system that minimizes computational overhead by recursively folding multiple evaluations of the same constraints into a compact proof. As a complete proof system, VIMz proves the integrity of both the original and edited images, as well as the correctness of the transformation without revealing intermediate images within a chain of edits--only the final result is disclosed. Moreover, VIMz maintains the anonymity of the original signer and all subsequent editors while proving the authenticity of the final image. We also compare VIMz with the system model in Coalition for Content Provenance and Authenticity (C2PA) from different perspectives and show that VIMz offers higher level of security guarantee by eliminating the need to trust the editing environment. Experimental results show that VIMz performs efficiently in both prover and verifier sides. It can prove the transformations on 8K (33MP,i.e., 100MB) images with up to 13%~25% faster than the competition, while reaching to a peak memory of only 10 GB. Moreover, VIMz has a verification time of under 1 second and achieves succinct proofs of less than 11 KB for all resolutions, which is more than 90% improvement compared to the competition. VIMz’s low memory complexity allows for proving multiple transformations in parallel to achieve a 3.5x additional speedup on average.
Compact Key Function Secret Sharing with Non-linear Decoder
We present a variant of Function Secret Sharing (FSS) schemes tailored for point, comparison, and interval functions, featuring compact key sizes at the expense of additional comparison. While existing FSS constructions are primarily geared towards $2$-party scenarios, exceptions such as the work by Boyle et al. (Eurocrypt 2015) and Riposte (S&P 2015) have introduced FSS schemes for $p$-party scenarios ($p \geq 3$). This paper aims to achieve the most compact $p$-party FSS key size to date. We achieve a noteworthy reduction in key size, a $2^p$-factor decrease compared to state-of-the-art FSS constructions (including computationally efficient constructions using symmetric-key primitives) of distributed point function (DPF). Compared to the previous public-key-based FSS design for DPF, we also get a key size reduction equal to a $2^{n/2}$-sized row vector, where $2^n$ is the domain size of the point function. This reduction in key size comes at the cost of a required comparison operation by the decoder (hence called a non-linear decoder), a departure from prior schemes. In $p$-party scenarios, our construction outperforms existing FSS constructions in key size, remaining on par with Riposte in evaluation time and showing significant improvement over Boyle et al.
In addition to constructing FSS for distributed point functions (DPF), we extend our approach to distributed comparison and interval functions, achieving the most efficient key size to date. Our distributed comparison function exhibits a key-size reduction by a factor of $q^{p-1}$, where $q$ denotes the size of the algebraic group used in the scheme's construction.
The reduced key size of the comparison function has practical implications, particularly in applications like privacy-preserving machine learning (PPML), where thousands of comparison functions are employed in each neural network layer.
To demonstrate the effectiveness of our improvements, we design and prototype-implement a scalable privacy-preserving framework for neural networks over distributed models. Specifically, we implement a distributed rectified linear unit (ReLU) activation function using our distributed comparison function, showcasing the efficacy of our proposed scheme.
Insta-Pok3r: Real-time Poker on Blockchain
We develop a distributed service for generating correlated randomness (e.g. permutations) for multiple parties, where each party’s output is private but publicly verifiable. This service provides users with a low-cost way to play online poker in real-time, without a trusted party.
Our service is backed by a committee of compute providers, who run a multi-party computation (MPC) protocol to produce an (identity-based) encrypted permutation of a deck of cards, in an offline phase well ahead of when the players’ identities are known. When the players join, what we call the online phase, they decrypt their designated cards immediately after deriving the identity-based decryption keys, a much simpler computation. In addition, the MPC protocol also generates a publicly-verifiable proof that the output is a permutation.
In our construction, we introduce a new notion of succinctly verifiable multi-identity based encryption (SVME), which extends the existing notion of verifiable encryption to a multi-identity-based setting, but with a constant sized proof – this may be of independent interest. We instantiate this for a permutation relation (defined over a small set) along with identity-based encryption, polynomial commitments and succinct proofs – our choices are made to enable a distributed computation when the card deck is always secret shared. Moreover, we design a new protocol to efficiently generate a secret-sharing of random permutation of a small set, which is run prior to distributed SVME.
Running these protocols offline simplifies the online phase substantially, as parties only derive their identity-specific keys privately via secure channels with the MPC committee, and then decrypt locally to obtain their decks. We provide a rigorous UC-based formalization in a highly modularized fashion.
Finally, we demonstrate practicality with an implementation that shows that for 8 MPC parties, gen- erating a secret publicly-verifiable permutation of 64 cards takes under 3 seconds, while accessing cards for a player takes under 0.3 seconds.
Quirky Interactive Reductions of Knowledge
Interactive proofs and arguments of knowledge can be generalized to the concept of interactive reductions of knowledge, where proving knowledge of a witness for one NP language is reduced to proving knowledge of a witness for another NP language. We take this generalization and specialize it to a class of reductions we refer to as `quirky interactive reductions of knowledge' (or QUIRKs). This name reflects our particular design choices within the broad and diverse world of interactive reduction methods. A central design choice is allowing the prover to rewind or regress to any previous reduction and repeat it as many times as desired. We prove completeness and extractability properties for QUIRKs. We also offer tools for constructing extraction algorithms along with several simple examples of usage.
HEProfiler: An In-Depth Profiler of Approximate Homomorphic Encryption Libraries
Fully Homomorphic Encryption (FHE) allows computation on encrypted
data. Various software libraries have implemented the approximate-
arithmetic FHE scheme CKKS, which is highly useful for applications
in machine learning and data analytics; each of these libraries have differing performance and features. It is useful for developers and researchers to learn details about these libraries’ performance and their differences. Some previous work has profiled FHE and CKKS implementations for this purpose, but these comparisons are limited in their fairness and completeness.
In this article, we compare four major libraries supporting the CKKS
scheme. Working with the maintainers of each of the PALISADE,
Microsoft SEAL, HElib, and HEAAN libraries, we devise methods for fair
comparisons of these libraries, even with their widely varied development
strategies and library architectures. To show the practical performance of
these libraries, we present HEProfiler, a simple and extensible framework
for profiling C++ FHE libraries. Our experimental evaluation is complete in both the scope of tasks tested and metrics evaluated, allowing
us to draw conclusions about the behaviors of different libraries under a
wide range of real-world workloads. This is the first work-giving experimental comparisons of different bootstrapping-capable CKKS libraries.
Natively Compatible Super-Efficient Lookup Arguments and How to Apply Them
Lookup arguments allow an untrusted prover to commit to a vector $\vec f \in \mathbb{F}^n$ and show that its entries reside in a predetermined table $\vec t \in \mathbb{F}^N$. One of their key applications is to augment general-purpose SNARKs making them more efficient on subcomputations that are hard to arithmetize. In order for this "augmentation" to work out, a SNARK and a lookup argument should have some basic level of compatibility with respect to the commitment on $\vec f$. However, not all existing efficient lookup arguments are fully compatible with other efficient general-purpose SNARKs. This incompatibility can for example occur whenever SNARKs use multilinear extensions under the hood (e.g. Spartan) but the lookup argument is univariate in flavor (e.g. Caulk or $\mathsf{cq}$).
In this paper we discuss how to widen the spectrum of "super-efficient" lookup arguments (where the proving time is independent of the size of the lookup table): we present a new construction inspired by $\mathsf{cq}$and based on multilinear polynomial encodings (MLE). Our construction is the first lookup argument for any table that is also natively compatible with MLE-based SNARKs at comparable costs with other state-of-the-art lookup arguments, particularly when the large table is unstructured. This case arises in various applications, such as using lookups to prove that the program in a virtual machine is fetching the right instruction and when proving the correct computation of floating point arithmetic (e.g., in verifiable machine learning).
We also introduce a second more general construction: a compiler that, given any super-efficient lookup argument compatible with univariate SNARKs, converts it into a lookup argument compatible with MLE-based SNARKs with a very small overhead.
Finally, we discuss SNARKs that we can compose with our constructions as well as approaches for this composition to work effectively.
Password-authenticated Key Exchange and Applications
We analyse a two password-authenticated key exchange protocols, a variant of CPace and a protocol related to the well-known SRP protocol. Our security results are tight. The first result gives us some information about trade-offs for design choices in CPace. The second result provides information about the security of SRP.
Our analysis is done in a new game-based security definition for password-authenticated key exchange. Our definition accomodates arbitrary password sampling methodologies. Our definition also supports modular security analysis, which we illustrate by giving two example applications of password-authenticated key exchange: password-authenticated secure channels and password-authenticated device authorisation, capturing popular applications of passwords.
Shuffle Arguments Based on Subset-Checking
Zero-knowledge shuffle arguments are a useful tool for constructing mix-nets which enable anonymous communication. We propose a new shuffle argument using a novel technique that probabilistically checks that each weighted set of input elements corresponds to some weighted set of output elements, with weights from the same set as the input element weights. We achieve this using standard discrete log assumptions and the shortest integer solution (SIS) assumption. Our shuffle argument has prover and verifier complexity linear in the size of the shuffled set, and communication complexity logarithmic both in the shuffled set size and security parameter.
Enhancing Local Verification: Aggregate and Multi-Signature Schemes
An aggregate signature scheme is a digital signature protocol that enables the aggregation of multiple signatures. Given n signatures on n distinct messages from n different users, it is possible to combine all these signatures into a single, concise signature. This single signature, along with the n original messages, convinces the verifier that the n users indeed signed their respective n original messages. However, the verifier must have access to all the original messages to perform the verification, highlighting a potential limitation in terms of accessibility and efficiency. Goyal and Vaikuntanathan introduced the concept of local verification,
which allows the verifier to determine if a specific message m is part of the aggregated signature by only accessing the message m. In this paper, we extend the single-signer locally verifiable aggregate signature scheme initially proposed by Goyal and Vaikuntanathan, adapting it to a multi-signer context. Our generalization allows the verifier to validate multiple signatures simultaneously using an auxiliary value generated by the LocalOpen algorithm, thereby enhancing verification efficiency. Furthermore, we integrate this approach into the multi-signature scheme proposed by Boneh, Drijvers, and Neven, demonstrating its broader applicability and potential benefits in complex cryptographic systems.
Optimized Computation of the Jacobi Symbol
The Jacobi Symbol is an essential primitive in cryptographic applications such as primality testing, integer factorization, and various encryption schemes. By exploring the interdependencies among modular reductions within the algorithmic loop, we have developed a refined method that significantly enhances computational efficiency. Our optimized algorithm, implemented in the Rust language, achieves a performance increase of 72% over conventional textbook methods and is twice as fast as the previously fastest known Rust implementation.
This work not only provides a detailed analysis of the optimizations but also includes comprehensive benchmark comparisons to illustrate the practical advantages of our methods. Our algorithm is publicly available under an open-source license, promoting further research on foundational cryptographic optimizations.
Stochastic Secret Sharing with $1$-Bit Shares and Applications to MPC
The problem of minimizing the share size of threshold secret-sharing schemes is a basic research question that has been extensively studied. Ideally, one strives for schemes in which the share size equals the secret size. While this is achievable for large secrets (Shamir, CACM '79), no similar solutions are known for the case of binary, single-bit secrets. Current approaches often rely on so-called ramp secret sharing that achieves a constant share size at the expense of a slight gap between the privacy and the correctness thresholds. In the case of single-bit shares, this leads to a large gap which is typically unacceptable. The possibility of a meaningful notion of secret sharing scheme with 1-bit shares and almost optimal threshold has been left wide open. Of special interest is the case of threshold 0.5, which is motivated by information-theoretic honest-majority secure multiparty computation (MPC).
In this work, we present a new stochastic model for secret-sharing where each party is corrupted by the adversary with probability $p$, independently of the other parties, and correctness and privacy are required to hold with high probability over the choice of the corrupt parties. We present new secret sharing schemes with single-bit shares that tolerate any constant corruption probability $p<0.5$. Our construction is based on a novel connection between such stochastic secret-sharing schemes and error-correcting codes that achieve capacity over the binary erasure channel.
Our schemes are linear and multiplicative. We demonstrate the usefulness of the model by using our new schemes to construct MPC protocols with security against an adversary that passively corrupts an arbitrary subset of $0.499n$ of the parties, where the online communication per party consists of a single bit per AND gate and zero communication per XOR gate. Unlike competing approaches for communication-efficient MPC, our solution is applicable even in a real-time model in which the parties should compute a Boolean circuit whose gates arrive in real-time, one at a time, and are not known in advance.
A New Fine Tuning Method for FHEW/TFHE Bootstrapping with IND-CPAD Security
Fully homomorphic encryption (FHE) schemes enable computations on encrypted data, making them as a crucial component of privacy-enhancing technologies. Ducas and Micciancio introduced the FHEW scheme (Eurocrypt '15), which was further enhanced by Chillotti et al. with TFHE (Asiacrypt '17). These schemes support low-latency homomorphic evaluations of binary (or larger) gates due to their small parameter size. However, the evaluation failure probability in these schemes is highly sensitive to the choice of parameters, resulting in a limited range of viable parameters and a trade-off between failure probability and runtime.
Recently, Cheon et al. proposed a key recovery attack on the FHEW/TFHE schemes based on a novel security model for FHE, known as IND-CPA$^\text{D}$ security (CCS '24). Mitigating this attack requires achieving a negligible failure probability (e.g., $2^{-64}$). However, the limited range of parameter options in FHEW/TFHE necessitates the adoption of parameter sets with unnecessarily low failure probabilities, leading to inefficient runtime.
We propose a new bootstrapping method for the FHEW/TFHE shcemes that optimizes the trade-off between runtime and failure probability while maintaining ease of implementation. The proposed method allows selecting parameter sets that achieve the desired failure probabilities at various security levels, thereby maximizing runtime efficiency.
Adaptor Signatures: New Security Definition and A Generic Construction for NP Relations
An adaptor signatures (AS) scheme is an extension of digital signatures that allows the signer to generate a pre-signature for an instance of a hard relation. This pre-signature can later be adapted to a full signature with a corresponding witness. Meanwhile, the signer can extract a witness from both the pre-signature and the signature. AS have recently garnered more attention due to its scalability and interoperability. Dai et al. [INDOCRYPT 2022] proved that AS can be constructed for any NP relation using a generic construction. However, their construction has a shortcoming: the associated witness is exposed by the adapted signature. This flaw poses limits the applications of AS, even in its motivating setting, i.e., blockchain, where the adapted signature is typically uploaded to the blockchain and is public to everyone.
To address this issue, in this work we augment the security definition of AS by a natural property which we call witness hiding. We then prove the existence of AS for any NP relation, assuming the existence of one-way functions. Concretely, we propose a generic construction of witness-hiding AS from signatures and a weak variant of trapdoor commitments, which we term trapdoor commitments with a specific adaptable message. We instantiate the latter based on the Hamiltonian cycle problem. Since the Hamiltonian cycle problem is NP-complete, we can obtain witness hiding adaptor signatures for any NP relation.
On Sequential Functions and Fine-Grained Cryptography
A sequential function is, informally speaking, a function $f$ for which a massively parallel adversary cannot compute "substantially" faster than an honest user with limited parallel computation power. Sequential functions form the backbone of many primitives that are extensively used in blockchains such as verifiable delay functions (VDFs) and time-lock puzzles. Despite this widespread practical use, there has been little work studying the complexity or theory of sequential functions.
Our main result is a black-box oracle separation between sequential functions and one-way functions: in particular, we show the existence of an oracle $\mathcal{O}$ that implies a sequential function but not a one-way function. This seems surprising since sequential functions are typically constructed from very strong assumptions that imply one-way functions and also since time-lock puzzles are known to imply one-way functions (Bitansky et al., ITCS '16).
We continue our exploration of the theory of sequential functions. We show that, informally speaking, the decisional, worst-case variant of a certain class of sequential function called a continuous iterative sequential function (CISF) is PSPACE-complete. A CISF is, in a nutshell, a sequential function $f$ that can be written in the form $f \left(k, x \right) = g^{k} \left(x \right)$ for some function $g$ where $k$ is an input determining the number of "rounds" the function is evaluated. We then show that more general forms of sequential functions are not contained in PSPACE relative to a random oracle.
Given these results, we then ask if it is possible to build any interesting cryptographic primitives from sequential functions that are not one-way. It turns out that even if we assume just the existence of a CISF that is not one-way, we can build certain "fine-grained" cryptographic primitives where security is defined similarly to traditional primitives with the exception that it is only guaranteed for some (generally polynomial) amount of time. In particular, we show how to build "fine-grained" symmetric key encryption and "fine-grained" MACs from a CISF. We also show how to build fine-grained public-key encryption from a VDF with a few extra natural properties and indistinguishability obfucsation (iO) for null circuits. We do not assume one-way functions. Finally, we define a primitive that we call a commutative sequential function - essentially a sequential function that can be computed in sequence to get the same output in two different ways - and show that it implies fine-grained key exchange.
KyberSlash: Exploiting secret-dependent division timings in Kyber implementations
This paper presents KyberSlash1 and KyberSlash2 - two timing vulnerabilities in several implementations (including the official reference code) of the Kyber Post-Quantum Key Encapsulation Mechanism, recently standardized as ML-KEM. We demonstrate the exploitability of both KyberSlash1 and KyberSlash2 on two popular platforms: the Raspberry Pi 2 (Arm Cortex-A7) and the Arm Cortex-M4 microprocessor. Kyber secret keys are reliably recovered within minutes for KyberSlash2 and a few hours for KyberSlash1. We responsibly disclosed these vulnerabilities to maintainers of various libraries and they have swiftly been patched. We present two approaches for detecting and avoiding similar vulnerabilities. First, we patch the dynamic analysis tool Valgrind to allow detection of variable-time instructions operating on secret data, and apply it to more than 1000 implementations of cryptographic primitives in SUPERCOP. We report multiple findings. Second, we propose a more rigid approach to guarantee the absence of variable-time instructions in cryptographic software using formal methods.
Distributional Secure Merge
Secure merge refers to the problem of merging two sorted lists. The problem appears in different settings where each list is held by one of two parties, or the lists are themselves shared among two or more parties. The output of a secure merge protocol is secret shared. Each variant of the problem offers many useful applications.
The difficulty in designing secure merge protocols vis-a-vis insecure merge protocols (which work in linear time with a single pass over the lists) has to do with operations having to be oblivious or data-independent. In particular, the protocol cannot leak the positions of items of each list in the final merged list. On account of this, sorting-based secure merge protocols have been a common solution to the problem. However, as they introduce (poly)logarithmic overheads, there has been active investigation into the task of building (near) linear time secure merge protocols. Most recently, Hemenway et al. put forth a protocol for secure merge that does achieve linear communication and computation and a round complexity of $O({\log\log n})$, where $n$ is the length of the lists being merged. While this shows the feasibility of a linear time secure merge, it still leaves room for the design of a concretely efficient linear time secure merge.
In this work, we consider a relaxation of the problem where the lists are uniformly random. We show a secure merge protocol for uniformly random lists that achieves $O({n\log\log n})$, i.e., near linear communication and computation and a round complexity of $O({\log\log n})$, where $n$ is the length of the lists being merged. Our protocol design is general and can be instantiated in a variety of settings so long as the building blocks (basic ones such as comparisons and shuffles) can be realized in said settings. Although we do not achieve the same asymptotic guarantees as Hemenway et al., our work is concretely efficient. We implement our protocol and compare it to the state of the art sorting protocols and demonstrate an order of magnitude improvement in running times and communication for lists of size of $2^{20}$.
We also extend our protocol to work for lists sampled from arbitrary distributions. In particular, when the lists are (close to) identically distributed, we achieve the same efficiency as uniform lists. This immediately improve the performance of many crucial applications including PSI & Secure Join, thus illustrating the significance and applicability of our protocol in practice.
Improved Multi-Party Fixed-Point Multiplication
Machine learning is widely used for a range of applications and is increasingly offered as a service by major technology companies. However, the required massive data collection raises privacy concerns during both training and inference. Privacy-preserving machine learning aims to solve this problem. In this setting, a collection of servers secret share their data and use secure multi-party computation to train and evaluate models on the joint data. All prior work focused on the scenario where the number of servers is two or three. In this work, we study the problem where there are $N \geq 3$ servers amongst whom the data is secret shared.
A key component of machine learning algorithms is to perform fixed-point multiplication with truncation of secret shared decimal values. In this work, we design new protocols for multi-party secure fixed-point multiplication where each of the $N$ parties have one share each of the two values to be multiplied and receive one share of the product at the end of the protocol. We consider three forms of secret sharing - replicated, Shamir, and additive, and design an efficient protocol secure in the presence of a semi-honest adversary for each of the forms. Our protocols are more communication efficient than all prior work on performing multi-party fixed-point multiplication. Additionally, for replicated secret sharing, we design another efficient protocol that is secure in the presence of a malicious adversary. Finally, we leverage our fixed-point multiplication protocols to design secure multi-party computation (MPC) protocols for arbitrary arithmetic circuits that have addition and fixed-point multiplication with truncation gates. All our protocols are proven secure using a standard simulation based security definition. Our protocols for replicated and Shamir sharing work in the presence of an honest majority of parties while the one for additive sharing can tolerate a dishonest majority as well.
The Sum-Check Protocol over Fields of Small Characteristic
The sum-check protocol of Lund, Fortnow, Karloff, and Nisan underlies SNARKs with the fastest known prover. In many of its applications, the prover can be implemented with a number of field operations that is linear in the number, $n$, of terms being summed.
We describe an optimized prover implementation when the protocol is applied over an extension field of a much smaller base field. The rough idea is to keep most of the prover's multiplications over the base field, at the cost of performing more $\textit{total}$ field multiplications. When the sum-check protocol is applied to a product of polynomials that all output values in the base field, our algorithm reduces the number of extension field operations by multiple orders of magnitude. In other settings, our improvements are more modest but nonetheless meaningful.
In SNARK design, the sum-check protocol is often combined with a polynomial commitment scheme, which are growing faster, especially when the values being committed are small. These improved commitment schemes are likely to render the sum-check prover the overall bottleneck, which our results help to mitigate.
Efficient Secret Sharing for Large-Scale Applications
Threshold secret sharing enables distributing a message to $n$ parties such that no subset of fewer than $t$ parties can learn the message, whereas any subset of at least $t$ parties can recover the message. Despite being a fundamental primitive, secret sharing still suffers from one significant drawback, where its message reconstruction algorithm is computationally expensive for large privacy thresholds $t$. In this paper, we aim to address this significant drawback.
We study general $(t,c)$-ramp secret sharing schemes where the number of parties c needed to reconstruct the secret may be larger than $t$. We present a ramp secret sharing scheme whose reconstruction time is 2-7.8x faster than prior constructions suitable against adversaries that adaptively corrupt parties. For $t = 2^{20}$, our new protocol has reconstruction time of 5 seconds whereas prior work requires nearly half a minute. We see improvements starting from as small as $t = 256$. Furthermore, we obtain correctness threshold as small as $c \ge 1.05t$. To obtain our construction, we first improve the secret sharing frameworks by Cramer et al. (EUROCRYPT'15) and Applebaum et al. (CRYPTO'23) from erasure codes. Our new framework obtains secret sharing schemes that may be used against adversaries with adaptive corruptions while requiring only weaker correctness guarantees from the underlying erasure code with a distributed generation property. Furthermore, our new framework also maintains the linear homomorphism of the prior works. Afterwards, we present a concretely efficient erasure code from random band matrices that satisfies the distributed generation property.
We show that our secret sharing scheme can improve many real-world applications. In secure aggregation protocols for federated learning, we obtain up to 22% reductions in computational cost by replacing Shamir's scheme with our construction. We extend our protocol to obtain a verifiable ramp secret sharing scheme where each party can verify the consistency of the shares. Our new verifiable ramp secret sharing has 8.2-25.2x faster sharing and 2.7-23.2x faster reconstruction time compared to prior works. Finally, we present an improved distributed verifiable random function that may be used for decentralized randomness beacons.
Searching for differential addition chains
The literature sometimes uses slow algorithms to find minimum-length continued-fraction differential addition chains to speed up subsequent computations of multiples of points on elliptic curves. This paper introduces two faster algorithms to find these chains. The first algorithm prunes more effectively than previous algorithms. The second algorithm uses a meet-in-the-middle approach and appears to have a limiting cost exponent below 1.
Cryptography in the Common Haar State Model: Feasibility Results and Separations
Common random string model is a popular model in classical cryptography. We study a quantum analogue of this model called the common Haar state (CHS) model. In this model, every party participating in the cryptographic system receives many copies of one or more i.i.d Haar random states.
We study feasibility and limitations of cryptographic primitives in this model and its variants:
- We present a construction of pseudorandom function-like states with security against computationally unbounded adversaries, as long as the adversaries only receive (a priori) bounded number of copies. By suitably instantiating the CHS model, we obtain a new approach to construct pseudorandom function-like states in the plain model.
- We present separations between pseudorandom function-like states (with super-logarithmic length) and quantum cryptographic primitives, such as interactive key agreement and bit commitment, with classical communication. To show these separations, we prove new results on the indistinguishability of identical versus independent Haar states against LOCC (local operations, classical communication) adversaries.
Efficient Verifiable Differential Privacy with Input Authenticity in the Local and Shuffle Model
Local differential privacy (LDP) enables the efficient release of aggregate statistics without having to trust the central server (aggregator), as in the central model of differential privacy, and simultaneously protects a client's sensitive data. The shuffle model with LDP provides an additional layer of privacy, by disconnecting the link between clients and the aggregator. However, LDP has been shown to be vulnerable to malicious clients who can perform both input and output manipulation attacks, i.e., before and after applying the LDP mechanism, to skew the aggregator's results. In this work, we show how to prevent malicious clients from compromising LDP schemes. Our only realistic assumption is that the initial raw input is authenticated; the rest of the processing pipeline, e.g., formatting the input and applying the LDP mechanism, may be under adversarial control. We give several real-world examples where this assumption is justified. Our proposed schemes for verifiable LDP (VLDP), prevent both input and output manipulation attacks against generic LDP mechanisms, requiring only one-time interaction between client and server, unlike existing alternatives [37, 43]. Most importantly, we are the first to provide an efficient scheme for VLDP in the shuffle model. We describe, and prove security of, two schemes for VLDP in the local model, and one in the shuffle model. We show that all schemes are highly practical, with client run times of less than 2 seconds, and server run times of 5-7 milliseconds per client.
Embedding Integer Lattices as Ideals into Polynomial Rings
Many lattice-based crypstosystems employ ideal lattices for high efficiency. However, the additional algebraic structure of ideal lattices usually makes us worry about the security, and it is widely believed that the algebraic structure will help us solve the hard problems in ideal lattices more efficiently. In this paper, we study the additional algebraic structure of ideal lattices further and find that a given ideal lattice in a polynomial ring can be embedded as an ideal into infinitely many different polynomial rings by the coefficient embedding. We design an algorithm to verify whether a given full-rank lattice in $\mathbb{Z}^n$ is an ideal lattice and output all the polynomial rings that the given lattice can be embedded into as an ideal with bit operations $\mathcal{O}(n^3(\log n + B)^2(\log n)^2)$, where $n$ is the dimension of the lattice and $B$ is the upper bound of the bit length of the entries of the input lattice basis. We would like to point out that Ding and Lindner proposed an algorithm for identifying ideal lattices and outputting a single polynomial ring of which the input lattice can be regarded as an ideal with bit operations $\mathcal{O}(n^5B^2)$ in 2007. However, we find a flaw in Ding and Lindner's algorithm, and it causes some ideal lattices can't be identified by their algorithm.
PeaceFounder: centralised E2E verifiable evoting via pseudonym braiding and history trees
PeaceFounder is a centralised E2E verifiable e-voting system that leverages pseudonym braiding and history trees. The immutability of the bulletin board is maintained replication-free by voter’s client devices with locally stored consistency-proof chains. Meanwhile, pseudonym braiding done via an exponentiation mix before the vote allows anonymisation to be transactional with a single braider at a time. In contrast to existing E2E verifiable e-voting systems, it is much easier to deploy as the system is fully centralised, free from threshold decryption ceremonies, trusted setup phases and bulletin board replication. Furthermore, the body of a vote is signed with a braided pseudonym, enabling unlimited ballot types.
Reduction from Average-Case M-ISIS to Worst-Case CVP Over Perfect Lattices
This paper presents a novel reduction from the average-case hardness of the Module Inhomogeneous Short Integer Solution (M-ISIS) problem to the worst-case hardness of the Closest Vector Problem (CVP) by defining and leveraging “perfect” lattices for cryptographic purposes.
Perfect lattices, previously only theoretical constructs, are characterized by their highly regular structure, optimal density, and a central void, which we term the “Origin Cell.” The simplest Origin Cell is a hypercube with edge length 1 centered at the origin, guaranteed to be devoid of any valid lattice points.
By exploiting the unique properties of the Origin Cell, we recalibrate the parameters of the M-ISIS and CVP problems. Our results demonstrate that solving M-ISIS on average over perfect lattices is at least as hard as solving CVP in the worst case, thereby providing a robust hardness guarantee for M-ISIS. Additionally, perfect lattices facilitate exceptionally compact cryptographic variables, enhancing the efficiency of cryptographic schemes.
This significant finding enhances the theoretical foundation of lattice-based cryptographic problems and confirms the potential of perfect lattices in ensuring strong cryptographic security. The Appendix includes SageMath code to demonstrate the reproducibility of the reduction process from M-ISIS to CVP.
Constraint-Packing and the Sum-Check Protocol over Binary Tower Fields
SNARKs based on the sum-check protocol often invoke the ``zero-check PIOP''. This reduces the vanishing of many constraints to a single sum-check instance applied to an $n$-variate polynomial of the form $g(x) = \text{eq}(r,x) \cdot p(x)$, where $p$ is a product of multilinear polynomials, $r$ is a random vector, and $\text{eq}$ is the multilinear extension of the equality function. In recent SNARK designs, $p(x)$ is defined over a ``small'' base field, while $r$ is drawn from a large extension field $\mathbb{F}$ for security.
Recent papers (Bagad, Domb, and Thaler 2024; Gruen 2024) have optimized the sum-check protocol prover for this setting. However, these works still require the prover to ``pre-compute'' all evaluations of $\text{eq}(r, x)$ as $x$ ranges over $\{0, 1\}^{n}$,
and this computation involves about $n$ multiplications over the extension field $\mathbb{F}$.
In this note, we describe a modification to the zero-check PIOP in the case of binary tower fields that reduces this pre-computation cost by a factor of close to $\log |\mathbb{F}|$, which is $128$ in important applications. We show that our modification is sound, and that it strictly generalizes a (possibly folklore) technique of constraint-packing over field extensions.
A note on adding zero-knowledge to STARKs
We discuss zero-knowledge in the context of univariate argument systems which use the FRI proximity test for Reed-Solomon codes as polynomial commitment scheme.
We confine ourselves to small-field STARK, i.e. arguments with an arithmetization over a small finite field (the basefield), and we dwell on two techniques widely used in practice: Randomization by polynomials over the basefield, and decomposing the overall quotient into polynomials of smaller degree. In particular the latter is a source for mistakes, both in literature as well as in software implementations.
The current, updated version further includes a separate discussion on perfect zero-knowledge in permutation arguments.
A note on the G-FFT
For primes $p$ with $p+1$ being smooth, the G-FFT from Li and Xing [LX23] is an algebraic FFT, which at first glance seems equivalent to the circle FFT from [IACR eprint 2024/278]: It also uses the circle curve over $\mathbb F_p$ (in other words the projective line) as underlying domain, and interpolates by low-degree functions with poles over the same set of points. However, their approach to control the degree of the FFT basis is fundamentally different.
The G-FFT makes use of punctured Riemann-Roch spaces, and the construction works with the group doubling map only, no projection onto the $x$-axis involved.
In this note we give an elementary description of the G-FFT without using abstract algebra. We describe a variant which uses a simpler, and in our opinion more natural function space, and which treats the exceptional point of the domain (the group identity) differently. In comparison to the circle FFT, the G-FFT (both the original as well as our variant) has the following downsides. Interpolation and domain evaluation costs the double number of multiplications (the twiddle is not an ``odd'' function), and the function space is not invariant under the group action, causing additional overhead when applied in STARKs.
Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers.
We investigate popular open-source GC frameworks and discover that the AES encryption used in the garbling process follows a secret-dependent sequence. This vulnerability allows private inputs to be exposed through side-channel analysis. Based on this finding, we propose a side-channel attack on garbled circuits to recover the private inputs of both parties. Our attack does not require access to any plaintexts or ciphertexts in the protocol and is single-trace, adhering to the constraint that a garbled circuit can be executed only once. Furthermore, unlike existing attacks that can only target input non-XOR gates, our method applies to both input and internal non-XOR gates. Consequently, the secrets associated with every non-XOR gate are fully exposed as in an open book.
We comprehensively evaluate our attack in various scenarios. First, we perform the attack on single-platform software implementations of standard AES and interleaved AES on a 32-bit ARM processor, achieving a $100\%$ success rate in both cases. Next, we target a hardware implementation on a Xilinx Artix-7 FPGA, where the resolution of power consumption measurements and the number of samples are significantly limited. In this scenario, our attack achieves a success rate of $79.58\%$. Finally, we perform a cross-platform attack on two processors with different microarchitectures representing the two parties. The differing execution cycles and power sensors across the platforms increase the difficulty of side-channel analysis. Despite these challenges, our point-of-interest (POI) selection method allows our attack to achieve a $100\%$ success rate in this scenario as well. We also discuss effective countermeasures that can be readily applied to GC frameworks to mitigate this vulnerability.
A Practical Protocol for Quantum Oblivious Transfer from One-Way Functions
We present a new simulation-secure quantum oblivious transfer (QOT) protocol based on one-way functions in the plain model. With a focus on practical implementation, our protocol surpasses prior works in efficiency, promising feasible experimental realization. We address potential experimental errors and their correction, offering analytical expressions to facilitate the analysis of the required quantum resources. Technically, we achieve simulation security for QOT through an equivocal and relaxed-extractable quantum bit commitment.
Adaptively Secure 5 Round Threshold Signatures from MLWE/MSIS and DL with Rewinding
T-out-of-N threshold signatures have recently seen a renewed interest, with various types now available, each offering different tradeoffs.
However, one property that has remained elusive is adaptive security. When we target thresholdizing existing efficient signatures schemes based on the Fiat-Shamir paradigm such as Schnorr, the elusive nature becomes clear. This class of signature schemes typically rely on the forking lemma to prove unforgeability. That is, an adversary is rewound and run twice within the security game. Such a proof is at odds with adaptive security, as the reduction must be ready to answer 2(T-1) secret key shares in total, implying that it can reconstruct the full secret key. Indeed, prior works either assumed strong idealized models such as the algebraic group model (AGM) or modified the underlying signature scheme so as not to rely on rewinding based proofs.
In this work, we propose a new proof technique to construct adaptively secure threshold signatures for existing rewinding-based Fiat-Shamir signatures. As a result, we obtain the following:
1. The first adaptively secure 5 round lattice-based threshold signature under the MLWE and MSIS assumptions in the ROM. The resulting signature is a standard signature of Raccoon, a lattice-based signature scheme by del Pino et al., submitted to the additional NIST call for proposals.
2. The first adaptively secure 5 round threshold signature under the DL assumption in the ROM. The resulting signature is a standard Schnorr signature. To the best of our knowledge, this is the first adaptively secure threshold signature based on DL even assuming stronger models like AGM.
Our work is inspired by the recent statically secure lattice-based 3 round threshold signature by del Pino et al. (Eurocrypt~2024) based on Raccoon. While they relied on so-called one-time additive masks to solve lattice specific issues, we notice that these masks can also be a useful tool to achieve adaptive security. At a very high level, we use these masks throughout the signing protocol to carefully control the information the adversary can learn from the signing transcripts. Intuitively, this allows the reduction to return a total of 2(T-1) randomly sampled secret key shares to the adversary consistently and without being detected, resolving the above paradoxical situation. Lastly, by allowing the parties to maintain a simple state, we can compress our 5 round schemes into 4 rounds.
Threshold OPRF from Threshold Additive HE
An oblivious pseudorandom function (OPRF) is a two-party protocol in which a party holds an input and the other party holds the PRF key, such that the party having the input only learns the PRF output and the party having the key would not learn the input. Now, in a threshold oblivious pseudorandom function (TOPRF) protocol, a PRF key K is initially shared among T servers. A client can obtain a PRF value by interacting with t(≤ T) servers but is unable to compute the same with up to (t − 1) servers. In this paper, we present a practically efficient homomorphic encryption (HE)-based post-quantum secure TOPRF protocol. Our proposed approach, which is based on a novel use of threshold HE, is agnostic of the underlying PRF and outperforms existing fully homomorphic encryption (FHE)-based approaches for TOPRF computation by several orders of magnitude in terms of running time. The FHE-based approaches require bootstrapping, a computationally extensive operation, and the primary bottleneck for evaluating large-depth circuits. Whereas, our proposed approach is based on a multi-party computation (MPC) protocol that uses a threshold additive HE scheme based on Regev’s cryptosystem (J’ACM 2009) alternative to FHE-based approaches. Concretely, we show a novel replacement of bootstrapping required in traditional FHE schemes by a threshold additive HE-based interactive protocol that performs masked decryption followed by table look-ups, jointly performed by a group of servers holding secret shares of the HE decryption key. Finally, We present a practical validation of our approach by realizing an AES-based TOPRF with an evaluation time of less than 1 second on consumer-grade server(s).
SACfe: Secure Access Control in Functional Encryption with Unbounded Data
Privacy is a major concern in large-scale digital applications, such as cloud-computing, machine learning services, and access control. Users want to protect not only their plain data but also their associated attributes (e.g., age, location, etc). Functional encryption (FE) is a cryptographic tool that allows fine-grained access control over encrypted data. However, existing FE fall short as they are either inefficient and far from reality or they leak sensitive user-specific information.
We propose SACfe, a novel attribute-based FE scheme that provides secure, fine-grained access control and hides both the user’s attributes and the function applied to the data, while preserving the data’s confidentiality. Moreover, it enables users to encrypt unbounded-length messages along with an arbitrary number of hidden attributes into ciphertexts. We design SACfe, a protocol for performing linear computation on encrypted data while enforcing access control based on inner product predicates. We show how SACfe can be used for online biometric authentication for privacy-preserving access control. As an additional contribution, we introduce an attribute-based linear FE for unbounded length of messages and functions where access control is realized by monotone span programs. We implement our protocols using the CiFEr cryptographic library and show its efficiency for practical settings.
GRASP: Accelerating Hash-based PQC Performance on GPU Parallel Architecture
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of $SPHINCS^+$ signing and verification processes. We propose an adaptable parallelization strategy for $SPHINCS^+$, analyzing its signing and verification processes to identify critical sections for efficient parallel execution. Utilizing CUDA, we perform bottom-up optimizations, focusing on memory access patterns and hypertree computation, to enhance GPU resource utilization. These efforts, combined with kernel fusion technology, result in significant improvements in throughput and overall performance. Extensive experimentation demonstrates that our optimized CUDA implementation of $SPHINCS^+$ achieves superior performance. Specifically, our GRASP scheme delivers throughput improvements ranging from 1.37× to 3.45× compared to state-of-the-art GPU-based solutions and surpasses the NIST reference implementation by over three orders of magnitude, highlighting a significant performance advantage.
Oblivious Single Access Machines: A New Model for Oblivious Computation
Oblivious RAM (ORAM) allows a client to securely outsource memory storage to an untrusted server. It has been shown that no ORAM can simultaneously achieve small bandwidth blow-up, small client storage, and a single roundtrip of latency.
We consider a weakening of the RAM model, which we call the Single Access Machine (SAM) model. In the SAM model, each memory slot can be written to at most once and read from at most once. We adapt existing tree-based ORAM to obtain an oblivious SAM (OSAM) that has $O(\log n)$ bandwidth blow-up (which we show is optimal), small client storage, and a single roundtrip.
OSAM unlocks improvements to oblivious data structures/algorithms. For instance, we achieve oblivious unbalanced binary trees (e.g. tries, splay trees). By leveraging splay trees, we obtain a notion of caching ORAM, where an access in the worst case incurs amortized $O(\log^2 n)$ bandwidth blow-up and $O(\log n)$ roundtrips, but in many common cases (e.g. sequential scans) incurs only amortized $O(\log n)$ bandwidth blow-up and $O(1)$ roundtrips. We also give new oblivious graph algorithms, including computing minimum spanning trees and single source shortest paths, in which the OSAM client reads/writes $O(|E| \cdot \log |E|)$ words using $O(|E|)$ roundtrips, where $|E|$ is the number of edges. This improves over prior custom solutions by a log factor.
At a higher level, OSAM provides a general model for oblivious computation. We construct a programming interface around OSAM that supports arbitrary pointer-manipulating programs such that dereferencing a pointer to an object incurs $O(\log d \log n)$ bandwidth blowup and $O(\log d)$ roundtrips, where $d$ is the number of pointers to that object. This new interface captures a wide variety of data structures and algorithms (e.g., trees, tries, doubly-linked lists) while matching or exceeding prior best asymptotic results. It both unifies much of our understanding of oblivious computation and allows the programmer to write oblivious algorithms combining various common data structures/algorithms and beyond.
FASIL: A challenge-based framework for secure and privacy-preserving federated learning
Enhancing privacy in federal learning (FL) without considering robustness can create an open door for attacks such as poisoning attacks on the FL process. Thus, addressing both the privacy and security aspects simultaneously becomes vital. Although, there are a few solutions addressing both privacy and security in the literature in recent years, they have some drawbacks such as requiring two non-colluding servers, heavy cryptographic operations, or peer-to-peer communication topology. In this paper, we introduce a novel framework that allows the server to run some analysis for detection and mitigation of attacks towards the FL process, while satisfying the confidentiality requirements for the training data against the server. We evaluate the effectiveness of the framework in terms of security and privacy by performing experiments on some concrete examples. We also provide two instantiations of the framework with two different secure aggregation protocols to give a more concrete view how the framework works and we analyse the computation and communication overhead of the framework.
Structured-Seed Local Pseudorandom Generators and their Applications
In this note, we introduce structured-seed local pseudorandom generators, a relaxation of local pseudorandom generators. We provide constructions of this primitive under the sparse-LPN assumption, and explore its implications.
MaSTer: Maliciously Secure Truncation for Replicated Secret Sharing without Pre-Processing
Secure multi-party computation (MPC) in a three-party, honest majority scenario is currently the state-of-the-art for running machine learning algorithms in a privacy-preserving manner. For efficiency reasons, fixed-point arithmetic is widely used to approximate computation over decimal numbers. After multiplication in fixed-point arithmetic, truncation is required to keep the result's precision. In this paper, we present an efficient three-party truncation protocol secure in the presence of an active adversary without pre-processing and improve on the current state-of-the-art in MPC over rings using replicated secret sharing (RSS). By adding an efficient consistency check, we lift the efficient but only passively secure three-party truncation protocol from the ABY3 framework by Mohassel and Rindal into the malicious setting without pre-processed data. Our benchmark indicates performance improvements of an order of magnitude in the offline phase for a single batch training. Finally, we apply our protocol to a real-world application for diagnostic prediction based on publicly available ECG heartbeat data. We achieve an improvement by a factor of two in the total throughput for both LAN and WAN settings.
Polynomial sharings on two secrets: Buy one, get one free
While passive side-channel attacks and active fault attacks have been studied intensively in the last few decades, strong attackers combining these attacks have only been studied relatively recently. Due to its simplicity, most countermeasures against passive attacks are based on additive sharing. Unfortunately, extending these countermeasures against faults often leads to quite a significant performance penalty, either due to the use of expensive cryptographic operations or a large number of shares due to massive duplication. Just recently, Berndt, Eisenbarth, Gourjon, Faust, Orlt, and Seker thus proposed to use polynomial sharing against combined attackers (CRYPTO 2023). While they construct gadgets secure against combined attackers using only a linear number of shares, the overhead introduced might still be too large for practical scenarios.
In this work, we show how the overhead of nearly all known constructions using polynomial sharing can be reduced by nearly half by embedding two secrets in the coefficients of one polynomial at the expense of increasing the degree of the polynomial by one. We present a very general framework that allows adapting these constructions to this new sharing scheme and prove the security of this approach against purely passive side-channel attacks, purely active fault attacks, and combined attacks. Furthermore, we present new gadgets allowing us to operate upon the different secrets in a number of useful ways.
Attribute-Based Threshold Issuance Anonymous Counting Tokens and Its Application to Sybil-Resistant Self-Sovereign Identity
Self-sovereign identity (SSI) systems empower users to (anonymously) establish and verify their identity when accessing both digital and real-world resources, emerging as a promising privacy-preserving solution for user-centric identity management. Recent work by Maram et al. proposes the privacy-preserving Sybil-resistant decentralized SSI system CanDID (IEEE S&P 2021). While this is an important step, notable shortcomings undermine its efficacy. The two most significant among them being the following: First, unlinkability breaks in the presence of a single malicious issuer. Second, it introduces interactiveness, as the users are required to communicate each time with issuers to collect credentials intended for use in interactions with applications. This contradicts the goal of SSI, whose aim is to give users full control over their identities. This paper first introduces the concept of publicly verifiable attribute-based threshold anonymous counting tokens (tACT). Unlike recent approaches confined to centralized settings (Benhamouda et al., ASIACRYPT 2023), tACT operates in a distributed-trust environment. Accompanied by a formal security model and a provably secure instantiation, tACT introduces a novel dimension to token issuance, which, we believe, holds independent interest. Next, the paper leverages the proposed tACT scheme to construct an efficient Sybil-resistant SSI system. This system supports various functionalities, including threshold issuance, unlinkable multi-show selective disclosure, and non-interactive, non-transferable credentials that offer constant-size credentials. Finally, our benchmark results show an efficiency improvement in our construction when compared to CanDID all while accommodating a greater number of issuers and additionally reducing to a one-round protocol that can be run in parallel with all issuers.
Constant-Size Unbounded Multi-Hop Fully Homomorphic Proxy Re-Encryption from Lattices
Proxy re-encryption is a cryptosystem that achieves efficient encrypted data sharing by allowing a proxy to transform a ciphertext encrypted under one key into another ciphertext under a different key. Homomorphic proxy re-encryption (HPRE) extends this concept by integrating homomorphic encryption, allowing not only the sharing of encrypted data but also the homomorphic computations on such data. The existing HPRE schemes, however, are limited to a single or bounded number of hops of ciphertext re-encryptions. To address this limitation, this paper introduces a novel lattice-based, unbounded multi-hop fully homomorphic proxy re-encryption (FHPRE) scheme, with constant-size ciphertexts. Our FHPRE scheme supports an unbounded number of reencryption operations and enables arbitrary homomorphic computations over original, re-encrypted, and evaluated ciphertexts. Additionally, we propose a potential application of our FHPRE scheme in the form of a non-interactive, constant-size multi-user computation system for cloud computing environments.
Competitive Policies for Online Collateral Maintenance
Layer-two blockchain protocols emerged to address scalability issues related to fees, storage cost, and confirmation delay of on-chain transactions. They aggregate off-chain transactions into a fewer on-chain ones, thus offering immediate settlement and reduced transaction fees. To preserve security of the underlying ledger, layer-two protocols often work in a collateralized model; resources are committed on-chain to backup off-chain activities. A fundamental challenge that arises in this setup is determining a policy for establishing, committing, and replenishing the collateral in a way that maximizes the value of settled transactions.
In this paper, we study this problem under two settings that model collateralized layer-two protocols. The first is a general model in which a party has an on-chain collateral $C$ with a policy to decide on whether to settle or discard each incoming transaction. The policy also specifies when to replenish $C$ based on the remaining collateral value. The second model considers a discrete setup in which $C$ is divided among $k$ wallets, each of which is of size $C/k$, such that when a wallet is full, and so cannot settle any incoming transactions, it will be replenished. We devise several online policies for these models, and show how competitive they are compared to optimal (offline) policies that have full knowledge of the incoming transaction stream. To the best of our knowledge, we are the first to study and formulate online competitive policies for collateral and wallet management in the blockchain setting.
ammBoost: State Growth Control for AMMs
Automated market makers (AMMs) are a prime example of Web 3.0 applications. Their popularity and high trading activity led to serious scalability issues in terms of throughput and state size. In this paper, we address these challenges by utilizing a new sidechain architecture, building a system called ammBoost. ammBoost reduces the amount of on-chain transactions, boosts throughput, and supports blockchain pruning. We devise several techniques to enable layer 2 processing for AMMs, including a functionality-split and layer 2 traffic summarization paradigm, an epoch-based deposit mechanism, and pool snapshot-based and delayed token-payout trading. We also build a proof-of-concept for a Uniswap-inspired use case to empirically evaluate performance. Our experiments show that ammBoost decreases the gas cost by 96.05% and the chain growth by at least 93.42%, and that it can support up to 500x of the daily traffic volume of Uniswap. We also compare ammBoost to an Optimism-inspired solution showing a 99.94% reduction in transaction finality.
chainBoost: A Secure Performance Booster for Blockchain-based Resource Markets
Cryptocurrencies and blockchain technology provide an innovative model for reshaping digital services. Driven by the movement toward Web 3.0, recent systems started to provide distributed services, such as computation outsourcing or file storage, on top of the currency exchange medium. By allowing anyone to join and collect cryptocurrency payments for serving others, these systems create decentralized markets for trading digital resources. Yet, there is still a big gap between the promise of these markets and their practical viability. Existing initiatives are still early-stage and have already encountered security and efficiency obstacles. At the same time, existing work around promising ideas, specifically sidechains, fall short in exploiting their full potential in addressing these problems.
To bridge this gap, we propose chainBoost, a secure performance booster for decentralized resource markets. It expedites service related operations, reduces the blockchain size, and supports flexible service-payment exchange modalities at low overhead. At its core, chainBoost employs a sidechain, that has a (security and semantic) mutual-dependence with the mainchain, to which the system offloads heavy/frequent operations. To enable it, we develop a novel sidechain architecture composed of temporary and permanent blocks, a block suppression mechanism to prune the sidechain, a syncing protocol to permit arbitrary data exchange between the two chains, and an autorecovery protocol to support robustness and resilience. We analyze the security of chainBoost, and implement a proof-of-concept prototype for a distributed file storage market as a use case. For a market handling around 2000 transactions per round, our experiments show up to 11x improvement in throughput and 94% reduction in confirmation time. They also show that chainBoost can reduce the main blockchain size by about 90%, and that it outperforms comparable optimistic rollup solutions by reducing transaction finality by 99.7%.
Exploiting Clock-Slew Dependent Variability in CMOS Digital Circuits Towards Power and EM SCA Resilience
Mathematically secured cryptographic implementations leak critical information in terms of power, EM emanations, etc. Several circuit-level countermeasures are proposed to hinder side channel leakage at the source. Circuit-level countermeasures (e.g., IVR, STELLAR, WDDL, etc) are often preferred as they are generic and have low overhead. They either dither the voltage randomly or attenuate the meaningful signature at $V_{DD}$ port. Although any digital implementation has two generic ports, namely clock and $V_{DD}$, circuit-level countermeasures primarily focus on $V_{DD}$ port, and countermeasures using the clock are mainly unexplored. System-level clock randomization is ineffective due to post-processing techniques. This work, for the first time, presents clock-based countermeasures by providing a controlled slew that exploits the inherent variability of digital circuits in terms of power consumption and transforms power/EM emanation into a complex function of data and slew. Due to this, minimum traces-to-disclosure (MTD) improves by 100$\times$ with respect to the unprotected one.
Moreover, the slewed clock reduces the leaky frequency, and the clock randomization countermeasure is more effective as it becomes more difficult} to post-process in the frequency domain. Clock slew and randomization together have a cumulative effect(1800x) more than the multiplication of individual techniques (100x & 5x respectively). In brief, this paper presents a clock-level generic synthesizable countermeasure technique that improved the minimum-traces-to-disclosure (MTD) by 1800$\times$ and incurs only 11% area overhead, $<3\%$ power overhead (measured) and $<6\%$ performance overhead (measured). Moreover, this can be easily combined with other power-port-based mitigation techniques for enhanced security.
Sparsity-Aware Protocol for ZK-friendly ML Models: Shedding Lights on Practical ZKML
As deep learning is being widely adopted across various domains, ensuring the integrity of models has become increasingly crucial. Despite the recent advances in Zero-Knowledge Machine Learning (ZKML) techniques, proving the inference over large ML models is still prohibitive. To enable practical ZKML, model simplification techniques like pruning and quantization should be applied without hesitation. Contrary to conventional belief, recent development in ML space have demonstrated that these simplification techniques not only condense complex models into forms with sparse, low-bit weight matrices, but also maintain exceptionally high model accuracies that matches its unsimplified counterparts.
While such transformed models seem inherently ZK-friendly, directly applying existing ZK proof frameworks still lead to suboptimal inference proving performance. To make ZKML truly practical, a quantization-and-pruning-aware ZKML framework is needed. In this paper, we propose SpaGKR, a novel sparsity-aware ZKML framework that is proven to surpass capabilities of existing ZKML methods. SpaGKR is a general framework that is widely applicable to any computation structure where sparsity arises. It is designed to be modular - all existing GKR-based ZKML frameworks can be seamlessly integrated with it to get remarkable compounding performance enhancements. We tailor SpaGKR specifically to the most commonly-used neural network structure - the linear layer, and propose the SpaGKR-LS protocol that achieves asymptotically optimal prover time. Notably, when applying SpaGKR-LS to a special series of simplified model - ternary network, it achieves further efficiency gains by additionally leveraging the low-bit nature of model parameters.