Papers updated in last 365 days (Page 25 of 3019 results)
Counting Unpredictable Bits: A Simple PRG from One-way Functions
A central result in the theory of Cryptography, by Hastad, Imagliazzo, Luby and Levin [SICOMP’99], demonstrates that the existence one-way functions (OWF) implies the existence of pseudo-random generators (PRGs). Despite the fundamental importance of this result, and several elegant improvements/simplifications, analyses of constructions of PRGs from OWFs remain complex (both conceptually and technically).
Our goal is to provide a construction of a PRG from OWFs with a simple proof of security; we thus focus on the setting of non-uniform security (i.e., we start off with a OWF secure against non-uniform PPT, and we aim to get a PRG secure against non-uniform PPT).
Our main result is a construction of a PRG from OWFs with a self-contained, simple, proof of security, relying only on the Goldreich-Levin Theorem (and the Chernoff bound). Although our main goal is simplicity, the construction, and a variant there-of, also improves the efficiency—in terms of invocations and seed lengths—of the state-of-the-art constructions due to [Haitner-Reingold-Vadhan, STOC’10] and [Vadhan-Zheng, STOC’12], by a factor $O(\log^2 n)$.
The key novelty in our analysis is a generalization of the Blum-Micali [FOCS’82] notion of unpredictabilty—rather than requiring that every bit in the output of a function is unpredictable, we count how many unpredictable bits a function has, and we show that any OWF on $n$ input bits (after hashing the input and the output) has $n + O(\log n)$ unpredictable output bits. Such unpredictable bits can next be “extracted” into a pseudorandom string using standard techniques.
Scalable and Lightweight State-Channel Audits
Payment channels are one of the most prominent off-chain scaling solutions for blockchain systems. However, regulatory institutions have difficulty embracing them, as the channels lack insights needed for Anti-Money Laundering (AML) auditing purposes. Our work tackles the problem of a formal reliable and controllable inspection of off-ledger payment channels, by offering a novel approach for maintaining and reliably auditing statistics of payment channels. We extend a typical trustless Layer 2 protocol and provide a lightweight and scalable protocol such that:
- every state channel is provably auditable w.r.t. a configurable set of policy queries, such that a regulator can retrieve reliable insights about the channel;
- no information beyond the answers to auditing queries is leaked;
- the cryptographic operations are inexpensive, the setup is simple, and storage complexity is independent of the transaction graph's size.
We present a concrete protocol, based on Hydra Isomorphic State Channels (FC'21), and tie the creation of a state channel to real-world identifiers, both in a plain and privacy-preserving manner. For this, we employ verifiable credentials for decentralized identifiers, specifically verifiable Legal Entity Identifiers (vLEI) that increasingly gain traction for financial service providers and regulated institutions.
Exploiting signature leakages: breaking Enhanced pqsigRM
Enhanced pqsigRM is a code-based hash-and-sign scheme proposed to the second National Institute of Standards and Technology call for post-quantum signatures. The scheme is based on the $(U,U+V)$-construction and it enjoys remarkably small signature lengths, about $1$KBytes for a security level of $128$ bits. Unfortunately we show that signatures leak information about the underlying $(U,U+V)$-structure. It allows to retrieve the private-key with~$100, 000$ signatures.
Parameters of Algebraic Representation vs. Efficiency of Algebraic Cryptanalysis
The aim of an algebraic attack is to find the secret key by solving
a collection of relations that describe the internal structure of a cipher
for observations of plaintext/cipher-text pairs.
Although algebraic attacks are addressed for cryptanalysis of block and
stream ciphers, there is a limited understanding of the impact of algebraic
representation of the cipher on the efficiency of solving the resulting collection of equations.
In this paper, we investigate on how different S-box representations affect
the complexity of algebraic attacks, in an empirical manner.
In the literature some algebraic properties are intuitively proposed to evaluate optimality of an algebraic description of S-boxes for algebraic cryptanalysis.
In this paper, we compare different S-box representation for algebraic
cryptanalysis with doing experiments with SR family of block ciphers.
We also show that the so-called \textit{Forward-Backward} representation which is in contrast with all mentioned criteria for optimal representations criteria, practically gives better results than the compliant representations.
We also compare the representations for both $GF(2)$ and $GF(2^n)$ fields.
KAIME : Central Bank Digital Currency with Realistic and Modular Privacy
Recently, with the increasing interest in Central Bank Digital Currency (CBDC), many countries have been working on researching and developing digital currency. The most important reasons for this interest are that CBDC eliminates the disadvantages of traditional currencies and provides a safer, faster, and more efficient payment system. These benefits also come with challenges, such as safeguarding individuals’ privacy and ensuring regulatory mechanisms. While most researches address the privacy conflict between users and regulatory agencies, they miss an important detail. Important parts of a financial system are banks and financial institutions. Some studies ignore the need for privacy and include these institutions in the CBDC system, no system currently offers a solution to the privacy conflict between banks, financial institutions, and users. In this study, while we offer a solution to the privacy conflict between the user and the regulatory agencies, we also provide a solution to the privacy conflict between the user and the banks. Our solution, KAIME has also a modular structure. The privacy of the sender and receiver can be hidden if desired. Compared to previous related research, security analysis and implementation of KAIME is substantially simpler because simple and well-known cryptographic methods are used.
BG: A Modular Treatment of BFT Consensus
Uncategorized
Uncategorized
We provide an expressive framework that allows analyzing and generating provably secure, state-of-the-art Byzantine fault-tolerant (BFT) protocols. Our framework is hierarchical, including three layers. The top layer is used to model the message pattern and abstract key functions on which BFT algorithms can be built. The intermediate layer provides the core functions with high-level properties sufficient to prove the security of the top-layer algorithms. The bottom layer carefully defines predicates according to which we offer operational realizations for the core functions. All three layers in our framework are extensible and enable innovation. One may modify or extend any layer to theoretically cover all BFT protocols, known and unknown. Indeed, unlike prior BFT frameworks, our framework can analyze and recast BFT protocols in an exceedingly fine-grained manner. More importantly, our framework can readily generate new BFT protocols by simply enumerating the parameters in the framework. In this paper, we show that the framework allows us to fully specify and formally prove the security for 23 BFT protocols, including protocols matching HotStuff, Fast-HotStuff, Jolteon, and Marlin, and among these protocols, seven new protocols outperforming existing ones or achieving meaningful trade-offs among various performance metrics.
Constraint-Packing and the Sum-Check Protocol over Binary Tower Fields
SNARKs based on the sum-check protocol often invoke the ``zero-check PIOP''. This reduces the vanishing of many constraints to a single sum-check instance applied to an $n$-variate polynomial of the form $g(x) = \text{eq}(r,x) \cdot p(x)$, where $p$ is a product of multilinear polynomials, $r$ is a random vector, and $\text{eq}$ is the multilinear extension of the equality function. In recent SNARK designs, $p(x)$ is defined over a ``small'' base field, while $r$ is drawn from a large extension field $\mathbb{F}$ for security.
Recent papers (Bagad, Domb, and Thaler 2024; Gruen 2024) have optimized the sum-check protocol prover for this setting. However, these works still require the prover to ``pre-compute'' all evaluations of $\text{eq}(r, x)$ as $x$ ranges over $\{0, 1\}^{n}$,
and this computation involves about $n$ multiplications over the extension field $\mathbb{F}$.
In this note, we describe a modification to the zero-check PIOP in the case of binary tower fields that reduces this pre-computation cost by a factor of close to $\log |\mathbb{F}|$, which is $128$ in important applications. We show that our modification is sound, and that it strictly generalizes a (possibly folklore) technique of constraint-packing over field extensions.
Jolt-b: recursion friendly Jolt with basefold commitment
The authors of Jolt [AST24] pioneered a unique method for creating zero-knowledge virtual machines, known as the lookup singularity. This technique extensively uses lookup tables to create virtual machine circuits. Despite Jolt’s performance being twice as efficient as the previous state-of-the-art1 , there is potential for further enhancement.
The initial release of Jolt uses Spartan [Set20] and Hyrax [WTs+ 18] as their backend, leading to two constraints. First, Hyrax employs Pedersen commitment to build inner product arguments, which requires elliptic curve operations. Second, the verification of a Hyrax commitment takes square root time $O(\sqrt{N})$ relative to the circuit size $N$ . This makes the recursive verification of a Jolt proof impractical, as the verification circuit would need to execute all the Hyrax verification logic in-circuit. A later version of Jolt includes Zeromorph [KT23] and HyperKZG as their commitment backend, making the system recursion-friendly, as now the recursive verifier only needs to perform $O(\log N)$ operations, but at the
expense of a need for a trusted setup.
Our scheme, Jolt-b, addresses these issues by transitioning to the extension field of the Goldilocks and using the Basefold commitment scheme [ZCF23], which has an $O(\log^2 N)$ verifier time. This scheme mirrors the modifications of Plonky2 over the original Plonk [GWC19]: it transitions from EC fields to the Goldilocks field; it replaces the EC-based commitment scheme with an encoding-based commitment scheme.
We implemented Jolt-b, along with an optimized version of the Basefold scheme. Our benchmarks show that at a cost of 2.47x slowdown for the prover, we achieve recursion friendliness for the original Jolt. In comparison with other recursion-friendly Jolt variants, our scheme is 1.24x and 1.52x faster in prover time than the Zeromorph and HyperKZG variants of Jolt, respectively.
Distributed Verifiable Random Function With Compact Proof
Verifiable Random Functions (VRFs) are cryptographic primitives that generate unpredictable randomness along with proofs that are verifiable, a critical requirement for blockchain applications in decentralized finance, online gaming, and more. Existing VRF constructions often rely on centralized entities, creating security vulnerabilities. Distributed VRFs (DVRFs) offer a decentralized alternative but face challenges like large proof sizes or dependence on computationally expensive bilinear pairings.
In this research, a unique distributed VRF (DVRF) system called DVRFwCP with considerable improvements is proposed. DVRFwCP has constant-size proofs, which means that the size of the proof does not change based on the number of participants. This overcomes a significant drawback of earlier DVRF systems, which saw proof size increase with participant count. Furthermore, DVRFwCP produces more efficient verification than previous systems by eliminating the requirement for bilinear pairings throughout the verification process. These innovations contribute to a more secure and scalable solution for generating verifiable randomness in decentralized environments.
We compare our construction to well-established DVRF instantiations such as DDH-DVRF and GLOW-DVRF while also pointing out the major improvement in the estimated gas cost of these algorithms.
Proof-Carrying Data from Multi-folding Schemes
Proof-carrying data (PCD) is a cryptographic primitive enabling mutually distrustful parties to perform distributed computations on directed acyclic graphs with efficient and incremental verification. Key performance metrics include the prover cost at each step and the recursion overhead, which measures the additional cost beyond proving the original computation. Despite substantial advancements in constructing efficient PCD schemes, these metrics continue to be bottlenecks hindering their widespread application.
In this paper, we advance the research by constructing a new PCD scheme based on a new generalized construction of multi-folding schemes. Compared with the state-of-the-art PCD scheme by Bünz et al. (CRYPTO'21), our scheme reduces the prover cost at each step from $4r+6$ multi-scalar multiplications (MSMs) of size $O(|C|)$ to $1$ MSM of the same size, and the recursion overhead from $6$ MSMs of size $2r-1$, $1$ MSM of size $6r-3$ to $1$ MSM of size $2r-1$, where $r$ is the number of incoming edges at certain step and $|C|$ is the proving computation size. Additionally, our PCD scheme supports a more expressive constraint system for encoding computations, namely the customizable constraint system, which supports high-degree constraints, in contrast to the rank-1 constraint system adopted by existing PCD schemes that only supports quadratic constraints.
We implement our PCD scheme and report the concrete recursion overhead and practical efficiency for different values of $r$ and $|C|$. Compared with Bünz et al. (CRYPTO'21), our PCD scheme achieves a $2.5$ times lower recursion overhead when $r=2$ and $|C|=2^{20}$. Additionally, when $r=2$ and a proving computation size (excluding recursion overhead) of $2^{24}$, it takes $49$ seconds to generate a PCD proof at each step. Using a SNARK to compress the proof reduces the proof size from $1031$ MB to $13$ KB, with a tradeoff in the verifier time, which increases from $11$ seconds to $22$ seconds.
Generalized Indifferentiable Sponge and its Application to Polygon Miden VM
Cryptographic hash functions are said to be the work-horses of modern cryptography. One of the strongest approaches to assess a cryptographic hash function's security is indifferentiability. Informally, indifferentiability measures to what degree the function resembles a random oracle when instantiated with an ideal underlying primitive. However, proving the indifferentiability security of hash functions has been challenging due to complex simulator designs and proof arguments. The Sponge construction is one of the prevalent hashing method used in various systems. The Sponge has been shown to be indifferentiable from a random oracle when initialized with a random permutation.
In this work, we first introduce $\mathsf{GSponge}$, a generalized form of the Sponge construction offering enhanced flexibility in input chaining, field sizes, and padding types. $\mathsf{GSponge}$ not only captures all existing sponge variants but also unveils new, efficient ones. The generic structure of $\mathsf{GSponge}$ facilitates the discovery of two micro-optimizations for already deployed sponges. Firstly, it allows a new padding rule based on zero-padding and domain-separated inputs, saving one full permutation call in certain cases without increasing the generation time of zero-knowledge proofs. Secondly, it allows to absorb up to $\mathsf{c}/2$ more elements (that can save another permutation call for certain message lengths) without compromising the indifferentiability security. These optimizations enhance hashing time for practical use cases such as Merkle-tree hashing and short message processing.
We then propose a new efficient instantiation of $\mathsf{GSponge}$ called $\mathsf{Sponge2}$ capturing these micro-optimizations and provide a formal indifferentiability proof to establish both $\mathsf{Sponge2}$ and $\mathsf{GSponge}$'s security. This proof, simpler than the original for Sponges, offers clarity and ease of understanding for real-world practitioners. Additionally, it is demonstrated that $\mathsf{GSponge}$ can be safely instantiated with permutations defined over large prime fields, a result not previously formally proven.
FIN: Practical Signature-Free Asynchronous Common Subset in Constant Time
Asynchronous common subset (ACS) is a powerful paradigm enabling applications such as Byzantine fault-tolerance (BFT) and multi-party computation (MPC). The most efficient ACS framework in the information-theoretic setting is due to Ben-Or, Kelmer, and Rabin (BKR, 1994). The BKR ACS protocol has been both theoretically and practically impactful. However, the BKR protocol has an $O(\log n)$ running time (where $n$ is the number of replicas) due to the usage of $n$ parallel asynchronous binary agreement (ABA) instances, impacting both performance and scalability. Indeed, for a network of 16$\sim$64 replicas, the parallel ABA phase occupies about 95%$\sim$97% of the total runtime in BKR. A long-standing open problem is whether we can build an ACS framework with $O(1)$ time while not increasing the message or communication complexity of the BKR protocol.
In this paper, we resolve the open problem, presenting the first constant-time ACS protocol with $O(n^3)$ messages in the information-theoretic and signature-free settings. Moreover, as a key ingredient of our new ACS framework and an interesting primitive in its own right, we provide the first information-theoretic multivalued validated Byzantine agreement (MVBA) protocol with $O(1)$ time and $O(n^3)$ messages. Both results can improve---asymptotically and concretely---various applications using ACS and MVBA in the information-theoretic, quantum-safe, or signature-free settings. As an example, we implement FIN, a BFT protocol instantiated using our framework. Via a 121-server deployment on Amazon EC2, we show FIN is significantly more efficient than PACE (CCS 2022), the state-of-the-art asynchronous BFT protocol of the same type. In particular, FIN reduces the overhead of the ABA phase to as low as 1.23% of the total runtime, and FIN achieves up to 3.41x the throughput of PACE. We also show that FIN outperforms other BFT protocols with the standard liveness property such as Dumbo and Speeding Dumbo.
A Note on Efficient Computation of the Multilinear Extension
The multilinear extension of an $m$-variate function $f : \{0,1\}^m \to \mathbb{F}$, relative to a finite field $\mathbb{F}$, is the unique multilinear polynomial $\hat{f} : \mathbb{F}^m \to \mathbb{F}$ that agrees with $f$ on inputs in $\{0,1\}^m$.
In this note we show how, given oracle access to $f : \{0,1\}^m \to \mathbb{F}$ and a point $z \in \mathbb{F}^m$, to compute $\hat{f}(z)$ using exactly $2^{m+1}$ multiplications, $2^m$ additions and $O(m)$ additional operations. The amount of space used corresponds to $O(m)$ field elements.
DiLizium 2.0: Revisiting Two-Party Crystals-Dilithium
In previous years there has been an increased interest in designing threshold signature schemes. Most of the recent works focus on constructing threshold versions of ECDSA or Schnorr signature schemes due to their appealing usage in blockchain technologies. Additionally, a lot of research is being done on cryptographic schemes that are resistant to quantum computer attacks.
In this work, we propose a new version of the two-party Dilithium signature scheme. The security of our scheme is based on the hardness of Module-LWE and Module-SIS problems. In our construction, we follow a similar logic as Damgård et al. (PKC 2021) and use an additively homomorphic commitment scheme. However, compared to them, our protocol uses signature compression techniques from the original Dilithium signature scheme which makes it closer to the version submitted to the NIST PQC competition. We focus on two-party signature schemes in the context of user authentication.
Revisiting PACD-based Attacks on RSA-CRT
In this work, we use some recent developments in lattice-based cryptanalytic tools to revisit a fault attack on RSA-CRT signatures based on the Partial Approximate Common Divisor (PACD) problem. By reducing the PACD to a Hidden Number Problem (HNP) instance, we decrease the number of required faulted bits from 32 to 7 in the case of a 1024-bit RSA. We successfully apply the attack to RSA instances up to 8192-bit and present an enhanced analysis of the error-tolerance in the Bounded Distance Decoding (BDD) with predicate approach. Finally, evaluating the impact of standard side-channel and fault countermeasures, we show that merely verifying the signature before output is not an adequate protection against this attack. The reduction from PACD to HNP might be of independent interest.
DeCAF: Decentralizable Continuous Group Key Agreement with Fast Healing
Continuous group key agreement (CGKA) allows a group of users to maintain a continuously updated shared key in an asynchronous setting where parties only come online sporadically and their messages are relayed by an untrusted server. CGKA captures the basic primitive underlying group messaging schemes.
Current solutions including TreeKEM ("Messaging Layer Security'' (MLS) IETF RFC 9420) cannot handle concurrent requests while retaining low communication complexity. The exception being CoCoA, which is concurrent while having extremely low communication complexity (in groups of size $n$ and for $m$ concurrent updates the communication per user is $\log(n)$, i.e., independent of $m$). The main downside of CoCoA is that in groups of size $n$, users might have to do up to $\log(n)$ update requests to the server to ensure their (potentially corrupted) key material has been refreshed.
In this work we present a "fast healing'' concurrent CGKA protocol, named DeCAF, where users will heal after at most $\log(t)$ requests, with $t$ being the number of corrupted users. While also suitable for the standard central-server setting, our protocol is particularly interesting for realizing decentralized group messaging, where protocol messages (add, remove, update) are being posted on some append-only data structure rather than sent to a server. In this setting, concurrency is crucial once the rate of requests exceeds, say, the rate at which new blocks are added to a blockchain.
In the central-server setting, CoCoA (the only alternative with concurrency, sub-linear communication and basic post-compromise security) enjoys much lower download communication. However, in the decentralized setting - where there is no server which can craft specific messages for different users to reduce their download communication - our protocol significantly outperforms CoCoA. DeCAF heals in fewer rounds ($\log(t)$ vs. $\log(n)$) while incurring a similar per round per user communication cost.
Small Private Key Attack Against a Family of RSA-like Cryptosystems
Let $N=pq$ be the product of two balanced prime numbers $p$ and $q$. Elkamchouchi, Elshenawy and Shaban presented in 2002 an interesting RSA-like cryptosystem that uses the key equation $ed - k (p^2-1)(q^2-1) = 1$, instead of the classical RSA key equation $ed - k (p-1)(q-1) = 1$. The authors claimed that their scheme is more secure than RSA. Unfortunately, the common attacks developed against RSA can be adapted for Elkamchouchi \emph{et al.}'s scheme. In this paper, we introduce a family of RSA-like encryption schemes that uses the key equation $ed - k (p^n-1)(q^n-1) = 1$, where $n>0$ is an integer. Then, we show that regardless of the choice of $n$, there exists an attack based on continued fractions that recovers the secret exponent.
Switching Off your Device Does Not Protect Against Fault Attacks
Physical attacks, and among them fault injection attacks, are a significant threat to the security of embedded systems. Among the means of fault injection, laser has the significant advantage of being extremely spatially accurate. Numerous state-of-the-art studies have investigated the use of lasers to inject faults into a target at run-time. However, the high precision of laser fault injection comes with requirements on the knowledge of the implementation and exact execution time of the victim code. The main contribution of this work is the demonstration on experimental basis that it is also possible to perform laser fault injection on an unpowered device. Specifically, we targeted the Flash non-volatile memory of a 32-bit microcontroller. The advantage of this new attack path is that it does not require any synchronisation between the victim and the attacker. We provide an experimental characterization of this phenomenon with a description of the fault model from the physical level up to the software level. Finally, we applied these results to carry out a persistent fault analysis on a 128-bit AES with a particularly realistic attacker model which reinforces the interest of the PFA.
Structural Lower Bounds on Black-Box Constructions of Pseudorandom Functions
We address the black-box complexity of constructing pseudorandom functions (PRF) from pseudorandom generators (PRG). The celebrated GGM construction of Goldreich, Goldwasser, and Micali (Crypto 1984) provides such a construction, which (even when combined with Levin's domain-extension trick) has super-logarithmic depth. Despite many years and much effort, this remains essentially the best construction we have to date. On the negative side, one step is provided by the work of Miles and Viola (TCC 2011), which shows that a black-box construction which just calls the PRG once and outputs one of its output bits, cannot be a PRF.
In this work, we make significant further progress: we rule out black-box constructions of PRF from PRG that follow certain structural constraints, but may call the PRG adaptively polynomially many times. In particular, we define ``tree constructions" which generalize the GGM structure: they apply the PRG $G$ along a tree path, but allow for different choices of functions to compute the children of a node on the tree and to compute the next node on the computation path down the tree. We prove that a tree construction of logarithmic depth cannot be a PRF (while GGM is a tree construction of super-logarithmic depth). We also show several other results and discuss the special case of one-call constructions.
Our main results in fact rule out even weak PRF constructions with one output bit. We use the oracle separation methodology introduced by Gertner, Malkin, and Reingold (FOCS 2001), and show that for any candidate black-box construction $F^G$ from $G$, there exists an oracle relative to which $G$ is a PRG, but $F^G$ is not a PRF.
Automated Creation of Source Code Variants of a Cryptographic Hash Function Implementation Using Generative Pre-Trained Transformer Models
Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. Notably, these technologies have also been extended to computer programming languages with great success. However, GPT model outputs in general are stochastic and not always correct. For programming languages, the exact specification of the computer code, syntactically and algorithmically, is strictly required in order to ensure the security of computing systems and applications. Therefore, using GPT models to generate computer code poses an important security risk -- while at the same time allowing for potential innovation in how computer code is generated. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-hf, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over $130,000$ function re-write GPT output text blocks (that are potentially correct source code), approximately $40,000$ of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over $100,000$ distinct, novel, and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.
Towards Achieving Asynchronous MPC with Linear Communication and Optimal Resilience
Secure multi-party computation (MPC) allows a set of $n$ parties to jointly compute a function over their private inputs.
The seminal works of Ben-Or, Canetti and Goldreich [STOC '93] and Ben-Or, Kelmer and Rabin [PODC '94] settled the feasibility of MPC over asynchronous networks. Despite the significant line of work devoted to improving the communication complexity, current protocols with information-theoretic security and optimal resilience $t<n/3$ communicate $\Omega(n^4C)$ field elements for a circuit with $C$ multiplication gates. In contrast, synchronous MPC protocols with $O(nC)$ communication have long been known.
In this work we make progress towards closing this gap. We provide a novel MPC protocol in the asynchronous setting with statistical security that makes black-box use of an asynchronous complete secret-sharing (ACSS) protocol. The cost per multiplication reduces to the cost of distributing a constant number of sharings via ACSS, improving a linear factor over the state of the art by Choudhury and Patra [IEEE Trans. Inf. Theory '17].
With a recent concurrent work achieving ACSS with linear cost per sharing, we achieve an MPC with ${O}(nC)$ communication and optimal resilience.
Finding Bugs and Features Using Cryptographically-Informed Functional Testing
In 2018, Mouha et al. (IEEE Trans. Reliability, 2018) performed a post-mortem investigation of the correctness of reference implementations submitted to the SHA3 competition run by NIST, finding previously unidentified bugs in a significant portion of them, including two of the five finalists. Their innovative approach allowed them to identify the presence of such bugs in a black-box manner, by searching for counterexamples to expected cryptographic properties of the implementations under test. In this work, we extend their approach to key encapsulation mechanisms (KEMs) and digital signature schemes (DSSs). We perform our tests on multiple versions of the LibOQS collection of post-quantum schemes, to capture implementations at different points of the recent Post-Quantum Cryptography Standardization Process run by NIST. We identify multiple bugs, ranging from software bugs (segmentation faults, memory overflows) to cryptographic bugs, such as ciphertext malleability in KEMs claiming IND-CCA security. We also observe various features of KEMs and DSS that do not contradict any security guarantees, but could appear counter-intuitive.
Implementation and Performance Evaluation of Elliptic Curve Cryptography over SECP256R1 on STM32 Microprocessor
The use of Internet of Things (IoT) devices in embedded systems has become increasingly popular with advancing technologies. These devices become vulnerable to cyber attacks as they gain popularity. The cryptographic operations performed for the purpose of protection against cyber attacks are crucial to yield fast results in open networks and not slow down network traffic. Therefore, to enhance communication security, studies have been conducted in the literature on using asymmetric encryption and symmetric encryption together in IoT devices for activities such as key sharing, encryption, decryption, data signing, and verifying signed data. In this study, we first propose a cryptographic system engaging of IoT devices operated from a server. Then we do performance analysis of our proposal. In particular, we evaluate the elliptic curve Diffie-Hellman key exchange and elliptic curve digital signature algorithms on the Secp256r1 elliptic curve and AES symmetric encryption via the Micro uECC library conducted with the 32-bit STM32F410RB Nucleo development board microprocessor running at 48 MHz.
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack the small isogeny and point operations in hardware, devise a coarse-grained reconfigurable hardware architecture (CGRHA) based on FALU as the co-processor, and apply it to the RISC-V core with customized instructions, effectively avoiding extra time consumption for the data exchange with the software side and meanwhile increasing flexibility. Finally, we code the hardware in SystemVerilog language and the software in C language and run experiments on FPGAs. In the co-processor implementation, the experiment results show that our design for the four SIKE parameters achieves 2.6-4.4x speedup and obtains comparable or better area-time product to or than the state-of-the-art. In the hardware-software co-design experiments, we still have the superiority in speed and only <10\% of extra time is introduced by mutual communication.
A Long Tweak Goes a Long Way: High Multi-user Security Authenticated Encryption from Tweakable Block Ciphers
We analyze the multi-user (mu) security of a family of nonce-based authentication encryption (nAE) schemes based on a tweakable block cipher (TBC). The starting point of our work is an analysis of the mu security of the SCT-2 mode which underlies the nAE scheme Deoxys-II, winner of the CAESAR competition for the defense-in-depth category. We extend this analysis in two directions, as we detail now.
First, we investigate the mu security of several TBC-based variants of the counter encryption mode (including CTRT, the encryption mode used within SCT-2) that differ by the way a nonce, a random value, and a counter are combined as tweak and plaintext inputs to the TBC to produce the keystream blocks that will mask the plaintext blocks. Then, we consider the authentication part of SCT-2 and study the mu security of the nonce-based MAC Nonce-as-Tweak (NaT) built from a TBC and an almost universal (AU) hash function. We also observe that the standard construction of an AU hash function from a (T)BC can be proven secure under the assumption that the underlying TBC is unpredictable rather than pseudorandom, allowing much better conjectures on the concrete AU advantage. This allows us to derive the mu security of the family of nAE modes obtained by combining these encryption/MAC building blocks through the NSIV composition method.
Some of these modes require an underlying TBC with a larger tweak length than what is usually available for existing ones. We then show the practicality of our modes by instantiating them with two new TBC constructions, Deoxys-TBC-512 and Deoxys-TBC-640, which can be seen as natural extensions of the Deoxys-TBC family to larger tweak input sizes. Designing such TBCs with unusually large tweaks is prone to pitfalls: Indeed, we show that a large-tweak proposal for SKINNY published at EUROCRYPT 2020 presents an inherent construction flaw. We therefore provide a sound design strategy to construct large-tweak TBCs within the Superposition Tweakey (STK) framework, leading to new Deoxys-TBC and SKINNY variants. We provide software benchmarks indicating that while ensuring a very high security level, the performances of our proposals remain very competitive.
Securely Training Decision Trees Efficiently
Decision trees are an important class of supervised learning algorithms. When multiple entities contribute data to train a decision tree (e.g. for fraud detection in the financial sector), data privacy concerns necessitate the use of a privacy-enhancing technology such as secure multi-party computation (MPC) in order to secure the underlying training data. Prior state-of-the-art (Hamada et al.) construct an MPC protocol for decision tree training with a communication of $\mathcal{O}(hmN\log N)$, when building a decision tree of height $h$ for a training dataset of $N$ samples, each having $m$ attributes.
In this work, we significantly reduce the communication complexity of secure decision tree training.
We construct a protocol with communication complexity $\mathcal{O}(mN\log N + hmN + hN\log N)$, thereby achieving an improvement of $\approx \mathsf{min}(h, m, \log N)$ over Hamada et al.
At the core of our technique is an improved protocol to regroup sorted private elements further into additional groups (according to a flag vector) while maintaining their relative ordering. We implement our protocol in the MP-SPDZ framework and show that it requires $10\times$ lesser communication and is $9\times$ faster than the state-of-the-art.
Linear-Communication Asynchronous Complete Secret Sharing with Optimal Resilience
Secure multiparty computation (MPC) allows a set of $n$ parties to jointly compute a function on their private inputs. In this work, we focus on the information-theoretic MPC in the \emph{asynchronous network} setting with optimal resilience ($t<n/3$). The best-known result in this setting is achieved by Choudhury and Patra [J. Cryptol '23], which requires $O(n^4\kappa)$ bits per multiplication gate, where $\kappa$ is the size of a field element.
An asynchronous complete secret sharing (ACSS) protocol allows a dealer to share a batch of Shamir sharings such that all parties eventually receive their shares. ACSS is an important building block in AMPC. The best-known result of ACSS is due to Choudhury and Patra [J. Cryptol '23], which requires $O(n^3\kappa)$ bits per sharing. On the other hand, in the synchronous setting, it is known that distributing Shamir sharings can be achieved with $O(n\kappa)$ bits per sharing. There is a gap of $n^2$ in the communication between the synchronous setting and the asynchronous setting.
Our work closes this gap by presenting the first ACSS protocol that achieves $O(n\kappa)$ bits per sharing. When combined with the compiler from ACSS to AMPC by Choudhury and Patra [IEEE Trans. Inf. Theory '17], we obtain an AMPC with $O(n^2\kappa)$ bits per multiplication gate, improving the previously best-known result by a factor of $n^2$. Moreover, with a concurrent work that improves the compiler by Choudhury and Patra by a factor of $n$, we obtain the first AMPC with $O(n\kappa)$ bits per multiplication gate.
A Simple Post-Quantum Oblivious Transfer Protocol from Mod-LWR
Oblivious transfer (OT) is a fundamental cryptographic protocol that plays a crucial role in secure multi-party computation (MPC). Most practical OT protocols by, e.g., Naor and Pinkas (SODA'01) or Chou and Orlandi (Latincrypt'15), are based on Diffie-Hellman (DH)-like assumptions and not post-quantum secure. In contrast, many other components of MPC protocols, including garbled circuits and secret sharings, are post-quantum secure. The reliance on non-post-quantum OT protocols presents a significant security bottleneck with the advent of quantum computing.
In this paper, we address this issue by constructing a simple, efficient OT protocol based on Saber, a Mod-LWR-based key exchange protocol. We implemented our OT protocol and conducted experiments to evaluate its performance. Our results show that our OT protocol significantly outperforms the state-of-the-art Kyber-based post-quantum OT protocol by Masny and Rindal (CCS'19) in terms of both computation and communication costs. Furthermore, the computation speed of our OT protocol is faster than the best-known DH-based OT protocol by Chou and Orlandi (Latincrypt'15), making it competitive to replace DH-based OT in the high-bandwidth network setting.
Public vs Private Blockchains lineage storage
This paper reports the experimental results related to lineage event storage via smart contracts deployed on private and public blockchain. In our experiments we measure the following three metrics: the cost to deploy the storage smart contract on the blockchain, which measures the initial expenditure, typically in gas units, required to deploy the smart contract that facilitates lineage event storage, then the time and gas costs needed to store a lineage event. We investigated both single and multi-clients scenarios. We considered the following public blockchains: Hedera, Fantom, Harmony Shard0, Polygon Amoy, Ethereum Sepolia, Optimism Sepolia, Klaytn Baobab and Arbitrum Sepolia. Furthermore, we investigate the performances of Hyperledger Besu with different consensus algorithms as private blockchains.
Fast, Large Scale Dimensionality Reduction Schemes Based on CKKS
The proliferation of artificial intelligence and big data has resulted in a surge in data demand and increased data dimensionality. This escalation has consequently heightened the costs associated with storage and processing. Concurrently, the confidential nature of data collected by various institutions, which cannot be disclosed due to personal privacy concerns, has exacerbated the challenges associated with data analysis and machine learning model training. Therefore, designing a secure and efficient high-dimensional data reduction method that supports multi-party joint participation becomes critical to solving these problems.
This paper proposes a novel homomorphic encryption dimensionality reduction scheme (HE-DR) based on CKKS, which modifies the Rank-Revealing (RR) method to make it more applicable to fully homomorphic encryption, thereby achieving fast and secure dimension reduction for high-dimensional data. Compared to traditional homomorphic encryption dimensionality reduction schemes, our approach does not transmit the user’s original data to other participants in any format (Ciphertext or Plaintext). Moreover, our method's computational efficiency is nearly $60-200$ times faster than similar algorithms, and the communication overhead is only $1/3$ of theirs. Finally, we have shown that our proposed scheme can preserve its computational efficiency and accuracy even when dealing with high-dimensional data. As dimensionality escalates, the ratio of ciphertext to plaintext computational efficiency plateaus at approximately 5 times, while the computational error (distance between subspaces) remains around $1e^{-11}$
Simple Logarithmic-size LSAG signature
A number of existing cryptosystems use the well-known linear-size LSAG signature concept, extending it in many ways. This article presents a simple logarithmic-size signature LS-LSAG which, despite a radical reduction in size, retains the basic code block of LSAG. Therefore, substituting LS-LSAG for LSAG requires minimal changes to almost any existing LSAG/CLSAG-based solution, making it logarithmic instead of linear.
HERatio: Homomorphic Encryption of Rationals using Laurent Polynomials
In this work we present $\mathsf{HERatio}$, a homomorphic encryption scheme that builds on the scheme of Brakerski, and Fan and Vercauteren. Our scheme naturally accepts Laurent polynomials as inputs, allowing it to work with rationals via their bounded base-$b$ expansions. This eliminates the need for a specialized encoder and streamlines encryption, while maintaining comparable efficiency to BFV. To achieve this, we introduce a new variant of the Polynomial Learning With Errors (PLWE) problem which employs Laurent polynomials instead of the usual ``classic'' polynomials, and provide a reduction to the PLWE problem.
Polytopes in the Fiat-Shamir with Aborts Paradigm
The Fiat-Shamir with Aborts paradigm (FSwA) uses rejection sampling to remove a secret’s dependency on a given source distribution. Recent results revealed that unlike the uniform distribution in the hypercube, both the continuous Gaussian and the uniform distribution within the hypersphere minimise the rejection rate and the size of the proof of knowledge. However, in practice both these distributions suffer from the complexity of their sampler. So far, those three distributions are the only available alternatives, but none of them offer the best of all worlds: competitive proof of knowledge size and rejection rate with a simple sampler.
We introduce a new generic framework for FSwA using polytope based rejection sampling to enable a wider variety of constructions. As a matter of fact, this framework is the first to generalise these results to integral distributions. To complement the lack of alternatives, we also propose a new polytope construction, whose uniform sampler approaches in simplicity that of the hypercube. At the same time, it provides competitive proof of knowledge size compared to that obtained from the Gaussian distribution. Concurrently, we share some experimental improvements of our construction to further reduce the proof size. Finally, we propose a signature based on the FSwA paradigm using both our framework and construction. We prove it to be competitive with Haetae in signature size and with Dilithium on sampler simplicity.
Ad Hoc Broadcast, Trace, and Revoke --- Plus Time-Space Trade-Offs for Attribute-Based Encryption
Traitor tracing schemes [Chor–Fiat–Naor, Crypto ’94] help content distributors fight against piracy and are defined with the content distributor as a trusted authority having access to the secret keys of all users. While the traditional model caters well to its original motivation, its centralized nature makes it unsuitable for many scenarios. For usage among mutually untrusted parties, a notion of *ad hoc* traitor tracing (naturally with the capability of broadcast and revocation) is proposed and studied in this work. Such a scheme allows users in the system to generate their own public/secret key pairs, without trusting any other entity. To encrypt, a list of public keys is used to identify the set of recipients, and decryption is possible with a secret key for any of the public keys in the list. In addition, there is a tracing algorithm that given a list of recipients’ public keys and a pirate decoder capable of decrypting ciphertexts encrypted to them, identifies at least one recipient whose secret key must have been used to construct the said decoder.
Two constructions are presented. The first is based on functional encryption for circuits (conceptually, obfuscation) and has constant-size ciphertext, yet its decryption time is linear in the number of recipients. The second is a generic transformation that reduces decryption time at the cost of increased ciphertext size. A matching lower bound on the trade-off between ciphertext size and decryption time is shown, indicating that the two constructions achieve all possible optimal trade-offs, i.e., they fully demonstrate the Pareto front of efficiency. The lower bound also applies to broadcast encryption (hence all mildly expressive attribute-based encryption schemes) and is of independent interest.
On Efficient and Secure Compression Modes for Arithmetization-Oriented Hashing
ZK-SNARKs, a fundamental component of privacy-oriented payment systems, identity protocols, or anonymous voting systems, are advanced cryptographic protocols for verifiable computation: modern SNARKs allow to encode the invariants of a program, expressed as an arithmetic circuit, in an appropriate constraint language from which short, zero-knowledge proofs for correct computations can be constructed.
One of the most important computations that is run through SNARK systems is the verification of Merkle tree (MT) opening proofs, which relies on the evaluation of a fixed-input-length (FIL) cryptographic compression function over binary MTs.
As classical, bit-oriented hash functions like SHA-2 are not compactly representable in SNARK frameworks, Arithmetization-Oriented (AO) cryptographic designs have emerged as an alternative, efficient solution.
Today, the majority of AO compression functions are built from the Sponge permutation-based hashing mode.
While this approach allows cost savings, compared to blockcipher-based modes, as it does not require key-scheduling, AO blockcipher schedulers are often cheap to compute.
Furthermore, classical bit-oriented cryptography has long studied how to construct provably secure compression functions from blockciphers, following the Preneel-Govaerts-Vandewalle (PGV) framework.
The potential efficiency gains together with the strong provable security foundations in the classic setting, motivate the study of AO blockcipher-based compression functions.
In this work, we propose PGV-LC and PGV-ELC, two AO blockcipher-based FIL compression modes inspired by and extending the classical PGV approach, offering flexible input and output sizes and coming with provable security guarantees in the AO setting.
We prove the collision and preimage resistance in the ideal cipher model, and give bounds for collision and opening resistance over MTs of arbitrary arity.
We compare experimentally the AO PGV-ELC mode over the Hades blockcipher with its popular and widely adopted Sponge instantiation, Poseidon, and its improved variant Poseidon2.
Our resulting constructions are up to \(3\times \) faster than Poseidon and \(2\times \) faster than Poseidon2 in native x86 execution, and up to \(50\% \) faster in the Groth16 SNARK framework.
Finally, we study the benefits of using MTs of arity wider than two, proposing a new strategy to obtain a compact R1CS constraint system in such case.
In fact, by combining an efficient parametrization of the Hades blockcipher over the PGV-ELC mode, together with an optimal choice of the MT arity, we measured an improvement of up to \(9\times \) in native MT construction time, and up to \(2.5\times \) in proof generation time, compared to Poseidon over binary MTs.
Faster Asynchronous Blockchain Consensus and MVBA
Blockchain consensus, a.k.a. BFT SMR, are protocols enabling $n$ processes to decide on an ever-growing chain. The fastest known asynchronous one is called 2-chain VABA (PODC'21 and FC'22), and is used as fallback chain in Abraxas* (CCS'23). It has a claimed $9.5\delta$ expected latency when used for a single shot instance, a.k.a. an MVBA.
We exhibit attacks breaking it. Hence, the title of the fastest asynchronous MVBA with quadratic messages complexity goes to sMVBA (CCS'22), with $10\delta$ expected latency.
Our positive contributions are two new and complementary designs.
$\bullet$ 2PAC (2-phase asynchronous consensus). It has a simpler and lighter chaining than in previous approaches. Instantiated with either quadratic or cubic phases of voting, it yields:
2PAC$^\text{lean}$: $+90\%$ throughput and $9.5\delta$ expected latency, with quadratic ($O(n^2)$) messages complexity. In both 2-chain VABA and sMVBA (as if chained, with pipelining), the quorum-certified transactions which were produced in the worst-case 1/3 of views with a slow leader were dumped, so the work was lost. The simpler design of 2PAC inserts such blocks in straight-line in the chain.
Thus, contrary to naive uncle-referencing, this comes with no computational overhead, yielding a net $+50\%$ throughput gain over chained sMVBA. Both the remaining throughput and latency ($-0.5\delta$) gains, come from the lighter interactive construction of proofs of consistency appended to proposed blocks, compared to sMVBA.
2PAC$^\text{BIG}$: the fastest asynchronous blockchain consensus with cubic ($O(n^3)$) messages complexity. Fault-free single shot MVBA runs decide in just $4\delta$, as soon as no message is delivered more than twice faster than others: GradedDAG (SRDS'23) required furthermore no messages reordering.
$\bullet$ Super Fast Pipelined Blocks. This is an upgrade of previous approaches for pipelining: in 2-chain VABA, Cordial Miners (DISC'23) and GradedDAG, a block pipelined by a leader in the middle of the view had almost twice larger latency than the non-pipelined block. Our design provides a fast path deciding the pipelined block with even smaller latency than the non-pipelined block. The fast delay is guaranteed in all executions with a fair scheduler, but remarkably, whatever the behaviors of faulty processes. Consistency is preserved by a lightweight mechanism, of one threshold signature appended per proposal.
Instantiated with the previous protocols, it yields: s2PAC$^\text{lean}$, with fast decision of pipelined blocks in $4\delta$; s2PAC$^\text{BIG}$, in $3\delta$; and sGradedDAG, in $3\delta$.
MUSEN: Aggregatable Key-Evolving Verifiable Random Functions and Applications
A Verifiable Random Function (VRF) can be evaluated on an input by a prover who holds a secret key, generating a pseudorandom output and a proof of output validity that can be verified using the corresponding public key. VRFs are a central building block of committee election mechanisms that sample parties to execute tasks in cryptographic protocols, e.g. generating blocks in a Proof-of-Stake (PoS) blockchain or executing a round of MPC protocols. We propose the notion, and a matching construction, of an Aggregatable Key-Evolving VRF (A-KE-VRF) with the following extra properties: 1. Aggregation: combining proofs for several VRF evaluations of different inputs under different secret keys into a single constant size proof; 2. Key-Evolving: preventing adversaries who corrupt a party (learning their secret key) from ``forging'' proofs of past VRF evaluations. As an immediate application, we improve on the block size of PoS blockchains and on the efficiency of Proofs of Proof-of-Stake (PoPoS). Furthermore, the A-KE-VRF notion allows us to construct Encryption to the Future (EtF) and Authentication from the Past (AfP) schemes with a Key-Evolving property, which provides forward security. An EtF scheme allows for sending a message to a party who is randomly selected to execute a role in the future, while an AfP scheme allows for this party to authenticate their messages as coming from a past execution of this role. These primitives are essential for realizing the YOSO MPC Framework (CRYPTO'21).
Squirrel: A Scalable Secure Two-Party Computation Framework for Training Gradient Boosting Decision Tree
Gradient Boosting Decision Tree (GBDT) and its variants are widely used in industry, due to their strong interpretability. Secure multi-party computation allows multiple data owners to compute a function jointly while keeping their input private. In this work, we present Squirrel, a two-party GBDT training framework on a vertically split dataset, where two data owners each hold different features of the same data samples. Squirrel is private against semi-honest adversaries, and no sensitive intermediate information is revealed during the training process. Squirrel is also scalable to datasets with millions of samples even under a Wide Area Network (WAN).
Squirrel achieves its high performance via several novel co-designs of the GBDT algorithms and advanced cryptography. Especially, 1) we propose a new and efficient mechanism to hide the sample distribution on each node using oblivious transfer. 2) We propose a highly optimized method for gradient aggregation using lattice-based homomorphic encryption (HE). Our empirical results show that our method can be three orders of magnitude faster than the existing HE approaches. 3) We propose a novel protocol to evaluate the sigmoid func- tion on secretly shared values, showing 19×-200×-fold im- provements over two existing methods. Combining all these improvements, Squirrel costs less than 6 seconds per tree on a dataset with 50 thousands samples which outperforms Pivot (VLDB 2020) by more than 28×. We also show that Squirrel can scale up to datasets with more than one million samples, e.g., about 170 seconds per tree over a WAN.
Bringing Order to Chaos: The Case of Collision-Resistant Chameleon-Hashes
Chameleon-hash functions, introduced by Krawczyk and Rabin at NDSS 2000, are trapdoor collision-resistant hash-functions parametrized by a public key. If the corresponding secret key is known, arbitrary collisions for the hash function can be efficiently found. Chameleon-hash functions have prominent applications in the design of cryptographic primitives, such as lifting non-adaptively secure signatures to adaptively secure ones. Recently, this primitive also received a lot of attention as a building block in more complex cryptographic applications ranging from editable blockchains to advanced signature and encryption schemes.
We observe that in latter applications various different notions of collision-resistance are used, and it is not always clear if the respective notion does really cover what seems intuitively required by the application. Therefore, we revisit existing collision-resistance notions in the literature, study their relations, and - using the example of the recent redactable blockchain proposals - discuss which practical impact different notions of collision-resistance might have. Moreover, we provide a stronger, and arguably more desirable, notion of collision-resistance than what is known from the literature. Finally, we present a surprisingly simple and efficient black-box construction of chameleon-hash functions achieving this strong notion.
BumbleBee: Secure Two-party Inference Framework for Large Transformers
Abstract—Large transformer-based models have realized state- of-the-art performance on lots of real-world tasks such as natural language processing and computer vision. However, with the increasing sensitivity of the data and tasks they handle, privacy has become a major concern during model deployment. In this work, we focus on private inference in two-party settings, where one party holds private inputs and the other holds the model. We introduce BumbleBee, a fast and communication-friendly two- party private transformer inference system. Our contributions are three-fold: First, we propose optimized protocols for matrix multiplication, which significantly reduce communication costs by 80% – 90% compared to previous techniques. Secondly, we develop a methodology for constructing efficient protocols tailored to the non-linear activation functions employed in transformer models. The proposed activation protocols have realized a significant enhancement in processing speed, alongside a remarkable reduction in communication costs by 80% – 95% compared with two prior methods. Lastly, we have performed extensive benchmarks on five transformer models. BumbleBee demonstrates its capability by evaluating the LLaMA-7B model, generating one token in approximately 14 minutes using CPUs. Our results further reveal that BumbleBee outperforms Iron (NeurIPS22) by over an order of magnitude and is three times faster than BOLT (Oakland24) with one-tenth communication.
Consolidated Linear Masking (CLM): Generalized Randomized Isomorphic Representations, Powerful Degrees of Freedom and Low(er)-cost
Masking is a widely adopted countermeasure against side-channel analysis (SCA) that protects cryptographic implementations from information leakage. However, current masking schemes often incur significant overhead in terms of electronic cost. RAMBAM, a recently proposed masking technique that fits elegantly with the AES algorithm, offers ultra-low latency/area by utilizing redundant representations of finite field elements. This paper presents a comprehensive generalization of RAMBAM and various other masking schemes within a unified framework and a mathematical representation known as Consolidated Linear Masking (CLM), where masking schemes are formalized by their encoding. We establish a theoretical foundation for CLM linking randomized isomorphic (code) representations and the entropy provided by the redundancy to a revised notion of masking order. Our analysis reveals that RAMBAM is a specific instance of CLM as well as other masking constructions, thus paving the way for significant enhancements. For example, a $1^{st}$-order secure design can be achieved almost without increasing the size of the representation of the variables. This property scales up to any order and is versatile. We demonstrate how CLM enables: (1) randomized selection of the isomorphic field for improved security; (2) flexible choice of the randomization polynomial; (3) embedded mask-refreshing via the randomized isomorphic representation that reduces randomness requirements significantly as well as improves performance; (4) a wider range of isomorphic randomized mappings that significantly increases the available randomization space compared to RAMBAM; (5) considerable improvement in securing fault-injection attacks and inherent security against probing adversaries, i.e., more required probes. In addition, our framework addresses ways to improve the brute-force parameter choices in the original RAMBAM. By offering a unifying theoretical perspective for masking and practical enhancements, this work advances the design of efficient and secure masking countermeasures against SCA threats.
Practical Non-interactive Multi-signatures, and a Multi-to-Aggregate Signatures Compiler
In a fully non-interactive multi-signature, resp. aggregate-signature scheme (fNIM, resp. fNIA), signatures issued by many signers on the same message, resp. on different messages, can be succinctly ``combined'', resp. ``aggregated''.
fNIMs are used in the Ethereum consensus protocol, to produce the certificates of validity of blocks which are to be verified by billions of clients. fNIAs are used in some PBFT-like consensus protocols, such as the production version of Diem by Aptos, to replace the forwarding of many signatures by a new leader. In this work we address three complexity bottlenecks.
(i) fNIAs are costlier than fNIMs, e.g., we observe that verification time of a 3000-wise aggregate signature of BGLS (Eurocrypt'03), takes 300x longer verification time than verification of a 3000-wise pairing-based multisignature.
(ii) fNIMs impose that each verifier processes the setup published by the group of potential signers. This processing consists either in verifying proofs of possession (PoPs), such as in Pixel (Usenix'20) and in the IETF'22 draft inherited from Ristenpart-Yilek (Eurocrypt'07), which costs a product of pairings over all published keys. Or, it consists in re-randomizing the keys, such as in SMSKR (FC'24).
(iii) Existing proven security bounds on efficient fNIMs do not give any guarantee in practical curves with 256bits-large groups, such as BLS12-381 (used in Ethereum) or BLS12-377 (used in Zexe). Thus, computing in much larger curves is required to have provable guarantees.
Our first contribution is a new fNIM called $\mathsf{dms}$, it addresses both (ii) and (iii).
It is as simple as adding Schnorr PoPs to the schoolbook pairing-based fNIM of Boldyreva (PKC'03).
(ii) For a group of 1000 signers, verification of these PoPs is: $5+$ times faster than for the previous pairing-based PoPs; and $3+$ times faster than the Verifier's processing of the setup in SMSKR (and contrary to the latter, needs not be re-started when a new member joins the group).
(iii) We prove a tight reduction to the discrete logarithm (DL), in the algebraic group model (AGM). Given the current estimation of roughly 128 bits of security for the DL in both the curves BLS12-381 and BLS12-377, we deduce a probability of forgery of $\mathsf{dms}$ no higher than about $2^{-93}$ for a time $2^{80}$ adversary.
This reduction is our main technical contribution. The only related proof before was for an interactive Schnorr-based multi-signature scheme, using Schnorr PoPs. Our approach easily fills a gap in this proof, since we take into account that the adversary has access to a signing oracle even before publishing its PoPs. But in our context of pairing-based multi-signatures, extraction of the keys of the adversary is significantly more complicated, since the signing oracle produces a correlated random string.
We finally provide another application of $\mathsf{dms}$, which is that it can be plugged in recent threshold signatures without setup (presented by Das et al at CCS'23, and Garg et al at SP'24), since these schemes implicitly build on any arbitrary BLS-based fNIM.
Our second contribution addresses (i), it is a very simple compiler: $\mathcal{M}to\mathcal{A}$ (multi-to-aggregate). It turns any fNIM into an fNIA, suitable for aggregation of signatures on messages with a prefix in common, with the restriction that a signer must not sign twice using the same prefix. The resulting fNIA is post-quantum secure as soon as the fNIM is, such as Chipmunk (CCS'23). We demonstrate the relevance for Diem by applying $\mathcal{M}to\mathcal{A}$ to $\mathsf{dms}$: the resulting fNIA enables to verify 39x faster an aggregate of 129 signatures, over messages with $7$ bits-long variable parts, than BGLS.
BOLT: Privacy-Preserving, Accurate and Efficient Inference for Transformers
The advent of transformers has brought about significant advancements in traditional machine learning tasks. However, their pervasive deployment has raised concerns about the potential leakage of sensitive information during inference. Existing approaches using secure multiparty computation (MPC) face limitations when applied to transformers due to the extensive model size and resource-intensive matrix-matrix multiplications. In this paper, we present BOLT, a privacy-preserving inference framework for transformer models that supports efficient matrix multiplications and nonlinear computations. Combined with our novel machine learning optimizations, BOLT reduces the communication cost by 10.91x. Our evaluation on diverse datasets demonstrates that BOLT maintains comparable accuracy to floating-point models and achieves 4.8-9.5x faster inference across various network settings compared to the state-of-the-art system.
LATKE: A Framework for Constructing Identity-Binding PAKEs
Motivated by applications to the internet of things (IoT), Cremers, Naor, Paz, and Ronen (CRYPTO '22) recently considered a setting in which multiple parties share a common password and want to be able to pairwise authenticate. They observed that using standard password-authenticated key exchange (PAKE) protocols in this setting allows for catastrophic impersonation attacks whereby compromise of a single party allows an attacker to impersonate any party to any other. To address this, they proposed the notion of identity-binding PAKE (iPAKE) and showed constructions of iPAKE protocol CHIP.
We present LATKE, a framework for iPAKE that allows us to construct protocols with features beyond what CHIP achieves. In particular, we can instantiate the components of our framework to yield an iPAKE protocol with post-quantum security and identity concealment, where one party hides its identity until it has authenticated the other. This is the first iPAKE protocol with either property.
To demonstrate the concrete efficiency of our framework, we implement various instantiations and compare the resulting protocols to CHIP when run on commodity hardware. The performance of our schemes is very close to that of CHIP, while offering stronger security properties.
Efficient Universally-Verifiable Electronic Voting with Everlasting Privacy
Universal verifiability is a must-to-have for electronic voting schemes. It is essential to ensure honest behavior of all the players during the whole process, together with the eligibility. However, it should not endanger the privacy of the individual votes, which is another major requirement.
Whereas the first property prevents attacks during the voting process, privacy of the votes should hold forever, which has been called everlasting privacy.
A classical approach for universal verifiability is to add some proofs together with the encrypted votes, which requires publication of the latter, while eligibility needs a link between the votes and the voters: it definitely excludes long-term privacy. An alternative is the use of perfectly-hiding commitments, on which proofs are published, while ciphertexts are kept private for computing the tally.
In this paper, we show how recent linearly-homomorphic signatures can be exploited for all the proofs, leading to very efficient procedures towards universal verifiability with both strong receipt-freeness and everlasting privacy.
Privacy will indeed be unconditional, after the publication of the results and the proofs, whereas the soundness of the proofs holds in the algebraic group model and the random oracle model.
A Note on ``Privacy Preserving n-Party Scalar Product Protocol''
We show that the scalar product protocol [IEEE Trans. Parallel Distrib. Syst. 2023, 1060-1066] is insecure against semi-honest server attack, not as claimed. Besides, its complexity increases exponentially with the number $n$, which cannot be put into practice.
Stickel’s Protocol using Tropical Increasing Matrices
In this paper we introduce new concept of tropical increasing matrices and then prove that two tropical increasing matrices are commute. Using this property, we modified Stickel’s protocol. This idea similar to [5] where modified Stickel’s protocol using commuting matrices (Linde De La Puente Matrices).
CaSCaDE: (Time-Based) Cryptography from Space Communications DElay
Uncategorized
Uncategorized
Time-based cryptographic primitives such as Time-Lock Puzzles (TLPs) and Verifiable Delay Functions (VDFs) have proven to be pivotal in several areas of cryptography. All existing candidate constructions, however, guarantee time-delays based on the average hardness of sequential computational problems. This means that any algorithmic or hardware improvement affects parameter choices and may turn deployed systems insecure.
To address this issue, we investigate how to build time-based cryptographic primitives where delays depend on sources other than sequential computations: namely, transmission delays caused by sequential communication. We explore sequential communication delays that arise when sending a message through a constellation of satellites in Space. This setting has the advantage that distances between protocol participants are guaranteed as positions of satellites are observable from Earth, moreover delay lower bounds are unconditional and can be easily computed using the laws of Physics (no transmission travels faster than the speed of Light).
We introduce proofs of sequential communication delay (SCD) in the Universal Composability framework, that can be used to convince a verifier that a message has accrued delay by traversing a path among a set of scattered satellites. With our SCD proofs we realize the first proposals of Publicly Verifiable TLPs and VDFs whose delay guarantees are rooted on physical limits, rather than ever-decreasing computational hardness. Finally, our notion of SCD paves the way to the first Delay Encryption construction not based on supersingular isogenies.
Trust Nobody: Privacy-Preserving Proofs for Edited Photos with Your Laptop
The Internet has plenty of images that are transformations (e.g., resize, blur) of confidential original images. Several scenarios (e.g., selling images over the Internet, fighting disinformation, detecting deep fakes) would highly benefit from systems allowing to verify that an image is the result of a transformation applied to a confidential authentic image. In this paper, we focus on systems for proving and verifying the correctness of transformations of authentic images guaranteeing: 1) confidentiality (i.e., the original image remains private), 2) efficient proof generation (i.e., the proof certifying the correctness of the transformation can be computed with a common laptop) even for high-resolution images, 3) authenticity (i.e., only the advertised transformations have been applied) and 4) fast detection of fraud proofs. Our contribution consists of the following results:
- We present new definitions following in part the ones proposed by Naveh and Tromer [IEEE S&P 2016] and strengthening them to face more realistic adversaries.
- We propose techniques leveraging the way typical transformations work to then efficiently instantiate ZK-snarks circumventing the major bottlenecks due to claims about large pre-images of cryptographic hashes.
- We present a 1st construction based on an ad-hoc signature scheme and an and-hoc cryptographic hash function, obtaining for the first time all the above 4 properties.
- We present a 2nd construction that, unlike in previous results, works with the signature scheme and cryptographic hash function included in the C2PA specifications.
Experimental results confirm the viability of our approach: in our 1st construction, an authentic transformation (e.g., a resize or a crop) of a high-resolution image of 30 MP can be generated on a common 8 cores PC in about 41 minutes employing less than 4 GB of RAM. Our 2nd construction is roughly one order of magnitude slower than our 1st construction. Prior results instead either require expensive computing resources or provide unsatisfying confidentiality.
cq: Cached quotients for fast lookups
We present a protocol called $\mathsf{cq}$ for checking the values of a committed polynomial $f(X)\in \mathbb{F}_{<n}(X)$ over a multiplicative subgroup $H\subset \mathbb{F}$ of size $n$ are contained in a table $t\in \mathbb{F}^N$. After an $O(N \log N)$ preprocessing step, the prover algorithm runs in time $O(n\log n)$.
Thus, we continue to improve upon the recent breakthrough sequence of results [ZBKMNS,PK,GK,ZGKMR] starting from $\mathsf{Caulk}$ [ZBKMNS], which achieve sublinear complexity in the table size $N$. The two most recent works in this sequence $\mathsf{Ba}\mathit{loo}$ [ZGKMR] and $\mathsf{flookup}$ [GK] achieved prover complexity $O(n\log^2 n)$.
Moreover, $\mathsf{cq}$ has the following attractive features.
1. As in [ZBKMNS,PK,ZGKMR] our construction relies on homomorphic table commitments, which makes them amenable to vector lookups.
2. As opposed to the previous four works, our verifier doesn't involve pairings with prover defined $\mathbb{G}_2$ points, which makes recursive aggregation of proofs more convenient.
FHE-MENNs: Opportunities and Pitfalls for Accelerating Fully Homomorphic Private Inference with Multi-Exit Neural Networks
With concerns about data privacy growing in a connected world, cryptography researchers have focused on fully homomorphic encryption (FHE) for promising machine learning as a service solutions. Recent advancements have lowered the computational cost by several orders of magnitude, but the latency of fully homomorphic neural networks remains a barrier to adoption. This work proposes using multi-exit neural networks (MENNs) to accelerate the FHE inference. MENNs are network architectures that provide several exit points along the depth of the network. This approach allows users to employ results from any exit and terminate the computation early, saving both time and power. First, this work weighs the latency, communication, accuracy, and computational resource benefits of running FHE-based MENN inference. Then, we present the TorMENNt attack that can exploit the user's early termination decision to launch a concrete side-channel on MENNs. We demonstrate that the TorMENNt attack can predict the private image classification output of an image set for both FHE and plaintext threat models. We discuss possible countermeasures to mitigate the attack and examine their effectiveness. Finally, we tie the privacy risks with a cost-benefit analysis to obtain a practical roadmap for FHE-based MENN adoption.
Random Beacons in Monte Carlo: Efficient Asynchronous Random Beacon without Threshold Cryptography
Regular access to unpredictable and bias-resistant randomness is important for applications such as blockchains, voting, and secure distributed computing. Distributed random beacon protocols address this need by distributing trust across multiple nodes, with the majority of them assumed to be honest. Numerous applications across the blockchain space have led to the proposal of several distributed random beacon protocols, with some already implemented. However, many current random beacon systems rely on threshold cryptographic setups or exhibit high computational costs, while others expect the network to be partial or bounded synchronous. To overcome these limitations, we propose HashRand, a computation and communication-efficient asynchronous random beacon protocol that only demands secure hash and pairwise secure channels to generate beacons. HashRand has a per-node amortized communication complexity of $\mathcal{O}(\lambda n \log(n))$ bits per beacon. The computational efficiency of HashRand is attributed to the two orders of magnitude lower time of a one-way Hash computation compared to discrete log exponentiation. Interestingly, besides reduced overhead, HashRand achieves Post-Quantum security by leveraging the secure Hash function against quantum adversaries, setting it apart from other random beacon protocols that use discrete log cryptography. In a geo-distributed testbed of $n=136$ nodes, HashRand produces 78 beacons per minute, which is at least 5x higher than Spurt [IEEE S\&P'22]. We also demonstrate the practical utility of HashRand by implementing a Post-Quantum secure Asynchronous SMR protocol, which has a response rate of over 135k transactions per second at a latency of $2.3$ seconds over a WAN for $n=16$ nodes.
Limits of Black-Box Anamorphic Encryption
(Receiver) Anamorphic encryption, introduced by Persiano $ \textit{et al.}$ at Eurocrypt 2022, considers the question of achieving private communication in a world where secret decryption keys are under the control of a dictator. The challenge here is to be able to establish a secret communication channel to exchange covert (i.e. anamorphic) messages on top of some already deployed public key encryption scheme.
Over the last few years several works addressed this challenge by showing new constructions, refined notions and extensions.
Most of these constructions, however, are either ad hoc, in the sense that they build upon specific properties of the underlying PKE, or impose severe restrictions on the size of the underlying anamorphic message space.
In this paper we consider the question of whether it is possible to have realizations of the primitive that are both generic and allow for large anamorphic message spaces. We give strong indications that, unfortunately, this is not the case.
Our first result shows that $ \textit{any black-box realization} $ of the primitive, i.e. any realization that accesses the underlying PKE only via oracle calls, $ \textit{must} $ have an anamorphic message space of size at most $poly(\lambda)$ ($\lambda$ security parameter).
Even worse, if one aims at stronger variants of the primitive (and, specifically, the notion of asymmetric anamorphic encryption, recently proposed by Catalano $ \textit{et al.} $) we show that such black-box realizations are plainly impossible, i.e. no matter how small the anamorphic message space is.
Finally, we show that our impossibility results are rather tight: indeed, by making more specific assumptions on the underlying PKE, it becomes possible to build generic AE where the anamorphic message space is of size $\Omega(2^\lambda)$.
A Scalable Coercion-resistant Voting Scheme for Blockchain Decision-making
Typically, a decentralized collaborative blockchain decision-making mechanism is realized by remote voting. To date, a number of blockchain voting schemes have been proposed; however, to the best of our knowledge, none of these schemes achieve coercion-resistance. In particular, for most blockchain voting schemes, the randomness used by the voting client can be viewed as a witness/proof of the actual vote, which enables improper behaviors such as coercion and vote-buying. Unfortunately, the existing coercion-resistant voting schemes cannot be directly adopted in the blockchain context. In this work, we design the first scalable coercion-resistant blockchain voting scheme that supports private differential voting power and 1-layer liquid democracy as introduced by Zhang et al. (NDSS '19). Its overall complexity is $O(n)$, where $n$ is the number of voters. Moreover, the ballot size is reduced from Zhang et al.'s $\Theta(m)$ to $\Theta(1)$, where $m$ is the number of experts and/or candidates. We formally prove that our scheme has ballot privacy, verifiability, and coercion-resistance. We implement a prototype of the scheme and the evaluation result shows that our scheme's tally procedure is more than 6x faster than VoteAgain (USENIX '20) in an election with over 10,000 voters and over 50\% extra ballot rate.
Note: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Post-Quantum Ready Key Agreement for Aviation
Transitioning from classically to quantum secure key agreement protocols may require to exchange fundamental components, for example, exchanging Diffie-Hellman-like key exchange with a key encapsulation mechanism (KEM). Accordingly, the corresponding security proof can no longer rely on the Diffie-Hellman assumption, thus invalidating the security guarantees. As a consequence, the security properties have to be re-proven under a KEM-based security notion.
We initiate the study of the LDACS key agreement protocol (Edition 01.01.00 from 25.04.2023), which is soon-to-be-standardized by the International Civil Aviation Organization. The protocol's cipher suite features Diffie-Hellman as well as a KEM-based key agreement protocol to provide post-quantum security. While the former results in an instantiation of an ISO key agreement inheriting all security properties, the security achieved by the latter is ambiguous.
We formalize the computational security using the systematic notions of de Saint Guilhem, Fischlin and Warinshi (CSF '20), and prove the exact security that the KEM-based variant achieves in this model; primarily entity authentication, key secrecy and key authentication. To further strengthen our ``pen-and-paper'' findings, we model the protocol and its security guarantees using Tamarin, providing an automated proof of the security against a Dolev-Yao attacker.
DLFA: Deep Learning based Fault Analysis against Block Ciphers
Previous studies on fault analysis have demonstrated promising potential in compromising cryptographic security. However, these fault analysis methods are limited in practical impact due to methodological constraints and the substantial requirement of faulty information such as correct and faulty ciphertexts. Additionally, while deep learning techniques have been widely applied to side-channel analysis (SCA) in recent years and have shown superior performance compared with traditional methods, there has been minimal research focusing on the application of deep learning techniques in fault analysis to date. This paper proposes an innovative approach named deep learning-based fault analysis (DLFA) by incorporating deep learning techniques into fault analysis, which enhances the efficiency and versatility of the analysis. DLFA is equipped with a novel feature selection method to extract valuable information for neural networks from faulty ciphertexts. Besides, optimized hyper-parameters for MLP and CNN are presented as the benchmarks. Experimental results on advanced encryption standard (AES) reveal that DLFA achieves outstanding performance. Notably, DLFA requires only 683 minimum and 1488 average ciphertexts with an average analysis time of 0.12s, surpassing previous works in terms of efficiency.
Quantum Complexity for Discrete Logarithms and Related Problems
This paper studies the quantum computational complexity of the discrete logarithm (DL) and related group-theoretic problems in the context of ``generic algorithms''---that is, algorithms that do not exploit any properties of the group encoding.
We establish the quantum generic group model and hybrid classical-quantum generic group model as quantum and hybrid analogs of their classical counterpart. This model counts the number of group operations of the underlying cyclic group $G$ as a complexity measure. Shor's algorithm for the discrete logarithm problem and related algorithms can be described in this model and make $O(\log |G|)$ group operations in their basic form. We show the quantum complexity lower bounds and (almost) matching algorithms of the discrete logarithm and related problems in these models.
* We prove that any quantum DL algorithm in the quantum generic group model must make $\Omega(\log |G|)$ depth of group operation queries. This shows that Shor's algorithm that makes $O(\log |G|)$ group operations is asymptotically optimal among the generic quantum algorithms, even considering parallel algorithms.
* We observe that some (known) variations of Shor's algorithm can take advantage of classical computations to reduce the number and depth of quantum group operations. We show that these variants are optimal among generic hybrid algorithms up to constant multiplicative factors: Any generic hybrid quantum-classical DL algorithm with a total number of (classical or quantum) group operations $Q$ must make $\Omega(\log |G|/\log Q)$ quantum group operations of depth $\Omega(\log\log |G| - \log\log Q)$.
* When the quantum memory can only store $t$ group elements and use quantum random access classical memory (QRACM) of $r$ group elements, any generic hybrid quantum-classical algorithm must make either $\Omega(\sqrt{|G|})$ group operation queries in total or $\Omega(\log |G|/\log (tr))$ quantum group operation queries. In particular, classical queries cannot reduce the number of quantum queries beyond $\Omega(\log |G|/\log (tr))$.
As a side contribution, we show a multiple discrete logarithm problem admits a better algorithm than solving each instance one by one, refuting a strong form of the quantum annoying property suggested in the context of password-authenticated key exchange protocol.
Insecurity of MuSig and Bellare-Neven Multi-Signatures with Delayed Message Selection
Multi-signature schemes in pairing-free settings require multiple communication rounds, prompting efforts to reduce the number of signing rounds that need to be executed after the signers receive the message to sign. In MuSig and Bellare-Neven multi-signatures, the signing protocol does not use the message until the third (and final) signing round. This structure seemingly allows pre-processing of the first two signing rounds before the signers receive the message. However, we demonstrate that this approach compromises security and enables a polynomial time attack, which uses the algorithm of Benhamouda et al. to solve the ROS problem.
Notes on Multiplying Cyclotomic Polynomials on a GPU
Uncategorized
Uncategorized
Lattice cryptography has many exciting applications, from homomorphic encryption to zero knowledge proofs. We explore the algebra of cyclotomic polynomials underlying many practical lattice cryptography constructions, and we explore algorithms for multiplying cyclotomic polynomials on a GPU.
Faster Lookup Table Evaluation with Application to Secure LLM Inference
As large language models (LLMs) continue to gain popularity, concerns about user privacy are amplified, given that the data submitted by users for inference may contain sensitive information. Therefore, running LLMs through secure two-party computation (a.k.a. secure LLM inference) has emerged as a prominent topic. However, many operations in LLMs, such as Softmax and GELU, cannot be computed using conventional gates in secure computation; instead, lookup tables (LUTs) have to be utilized, which makes LUT to be an essential primitive in secure LLM inference.
In this paper, we propose $\mathsf{ROTL}$, a secure two-party protocol for LUT evaluations. Compared with FLUTE (the state-of-the-art LUT presented at Oakland '23), it achieves upto 11.6$\times$ speedup in terms of overall performance and 155$\times$ speedup in terms of online performance. Furthermore, $\mathsf{ROTL}$ can support arithmetic shares (which is required by secure LLM inference), whereas FLUTE can only support boolean shares. At the heart of $\mathsf{ROTL}$ is a novel protocol for secret-shared rotation, which allows two parties to generate additive shares of the rotated table without revealing the rotation offset. We believe this protocol is of independent interest. Based on $\mathsf{ROTL}$, we design a novel secure comparison protocol; compared with the state-of-the-art, it achieves a 2.4$\times$ bandwidth reduction in terms of online performance.
To support boolean shares, we further provide an optimization for FLUTE, by reducing its computational complexity from $O(l\cdot n^2)$ to $O(n\log n+l\cdot n)$ and shifting $O(n\log n)$ computation to the preprocessing phase. As a result, compared with FLUTE, it achieves upto 10.8$\times$ speedup in terms of overall performance and 962$\times$ speedup in terms of online performance.
Fusion Channel Attack with POI Learning Encoder
In order to challenge the security of cryptographic systems, Side-Channel Attacks exploit data leaks such as power consumption and electromagnetic emissions. Classic Side-Channel Attacks, which mainly focus on mono-channel data, fail to utilize the joint information of multi-channel data. However, previous studies of multi-channel attacks have often been limited in how they process and adapt to dynamic data. Furthermore, the different data types from various channels make it difficult to use them effectively. This study introduces the Fusion Channel Attack with POI Learning Encoder (FCA), which employs a set of POI Learning encoders that learn the inverse base transformation function family and project the data of each channel into a unified fusion latent space. Furthermore, our method introduces an optimal transport theory based metric for evaluating feature space fusion, which is used to assess the differences in feature spaces between channels. This model not only enhances the ability to process and interpret multi-source data, but also significantly improves the accuracy and applicability of SCAs in different environments.
HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures
We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constant-time reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in
IoT and other embedded systems.
Enabling PERK and other MPC-in-the-Head Signatures on Resource-Constrained Devices
One category of the digital signatures submitted to the NIST Post-Quantum Cryptography Standardization Process for Additional Digital Signature Schemes comprises proposals constructed leveraging the MPC-in-the-Head (MPCitH) paradigm. Typically, this framework is characterized by the computation and storage in sequence of large data structures both in signing and verification algorithms, resulting in heavy memory consumption. While some research on the efficiency of these schemes on high-performance machines has been done, studying their performance and optimization on resource-constrained ones still needs to be explored.
In this work, we aim to address this gap by (1) introducing a general method to reduce the memory footprint of MPCitH schemes and analyzing its application to several MPCitH proposed schemes in the NIST Standardization Process. Additionally, (2) we conduct a detailed examination of potential memory optimizations in PERK, resulting in a streamlined version of the signing and verification algorithms with a reduced memory footprint ranging from 22 to 85 KB, down from the original 0.3 to 6 MB. Finally, (3) we introduce the first implementation of PERK tailored for Arm Cortex M4 alongside extensive experiments and comparisons against reference implementations.
Alternative Key Schedules for the AES
The AES block cipher is today the most important and analyzed symmetric algorithm. While all versions of the AES are known to be secure in the single-key setting, this is not the case in the related-key scenario. In this article we try to answer the question whether the AES would resist better differential-like related-key attacks if the key schedule was different. For this, we search for alternative permutation-based key schedules by extending the work of Khoo et al. at ToSC 2017 and Derbez et al. at SAC 2018. We first show that the model of Derbez et al. was flawed. Then, we develop different approaches together with MILP-based tools to find good permutations that could be used as the key schedule for AES-128, AES-192 and AES-256. Our methods permitted to find permutations that outperform the permutation exhibited by Khoo et al. for AES-128. Moreover, our new approach based on two MILP models that call one another allowed us to handle a larger search space and thus to search for alternative key schedules for the two bigger versions of AES. This method permitted us to find permutations for AES-192 and AES-256 that provide better resistance to related-key differential attacks. Most importantly, we showed that these variants can resist full-round boomerang attacks.
Exploiting Decryption Failures in Mersenne Number Cryptosystems
Mersenne number schemes are a new strain of potentially quantum-safe cryptosystems that use sparse integer arithmetic modulo a Mersenne prime to encrypt messages. Two Mersenne number based schemes were submitted to the NIST post-quantum standardization process: Ramstake and Mersenne-756839. Typically, these schemes admit a low but non-zero probability that ciphertexts fail to decrypt correctly.
In this work we show that the information leaked from failing ciphertexts can be used to gain information about the secret key. We present an attack exploiting this information to break the IND-CCA security of Ramstake. First, we introduce an estimator for the bits of the secret key using decryption failures. Then, our estimates can be used to apply the Slice-and-Dice attack due to Beunardeau et al. at significantly reduced complexity to recover the full secret.
We implemented our attack on a simplified version of the code submitted to the NIST competition. Our attack is able to extract a good estimate of the secrets using $2^{12}$ decryption failures, corresponding to $2^{74}$~failing ciphertexts in the original scheme. Subsequently the exact secrets can be extracted in about $2^{46}$ quantum computational steps.
MatcHEd: Privacy-Preserving Set Similarity based on MinHash
Fully homomorphic encryption (FHE) enables arbitrary computation on encrypted data, but certain applications remain prohibitively expensive in the encrypted domain. As a case in point, comparing two encrypted sets of data is extremely computationally expensive due to the large number of comparison operators required. In this work, we propose a novel methodology for encrypted set similarity inspired by the MinHash algorithm and the CGGI FHE scheme. Doing comparisons in FHE requires comparators and multiplexers or an expensive approximation, which further increases the latency, especially when the goal is to compare two sets of data. The MinHash algorithm can significantly reduce the number of comparisons required by employing a special Carter-Wegman (CW) hash function as a key building block. However, the modulus operation in the CW hash becomes another key bottleneck because the encrypted sub-circuits required to perform the modular reduction are very large and inefficient in an FHE setting. Towards that end, we introduce an efficient bitwise FHE-friendly digest function (FFD) to employ as the cornerstone of our proposed encrypted set-similarity. In a Boolean FHE scheme like CGGI, the bitwise operations can be implemented efficiently with Boolean gates, which allows for faster evaluation times relative to standard Carter-Wegman constructions. Overall, our approach drastically reduces the number of comparisons required relative to the baseline approach of directly computing the Jaccard similarity coefficients, and is inherently parallelizable, allowing for efficient encrypted computation on multi-CPU and GPU-based cloud servers. We validate our approach by performing a privacy-preserving plagiarism detection across encrypted documents.
PolyFHEmus: Rethinking Multiplication in Fully Homomorphic Encryption
Homomorphic encryption is a powerful technology that solves key privacy concerns in cloud computing by enabling computation on encrypted data. However, it has not seen widespread adoption due to prohibitively high latencies. In this article, we identify polynomial multiplication as a bottleneck and investigate alternative algorithms to accelerate encrypted computing.
Juliet: A Configurable Processor for Computing on Encrypted Data
Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU evaluation. To improve usability, we define an expressive assembly language and instruction set architecture (ISA) judiciously designed for end-to-end encrypted computation. We demonstrate Juliet's capabilities with a broad range of realistic benchmarks including cryptographic algorithms, such as the lightweight ciphers Simon and Speck, as well as logistic regression (LR) inference and matrix multiplication.
HElix: Genome Similarity Detection in the Encrypted Domain
As the field of genomics continues to expand and more sequencing data is gathered, genome analysis becomes increasingly relevant for many users. For example, a common scenario entails users trying to determine if their DNA samples are similar to DNA sequences hosted in a larger remote repository. Nevertheless, end users may be reluctant to upload their DNA sequences, while the owners of remote genomics repositories are unwilling to openly share their database. To address this challenge, we propose two distinct approaches based on fully homomorphic encryption to preserve the privacy of the genomic data and enable queries directly on ciphertexts. The first is based on the ubiquitous MinHash algorithm and can determine if similar matches exist in the database, while the second involves a bespoke bloom filter construction for determining exact matches. We validate both approaches across various database sizes using both GPU and CPU-based cloud servers.
Tyche: Probabilistic Selection over Encrypted Data for Generative Language Models
Generative AI, a significant technological disruptor in recent years, has impacted domains like augmented reality, coding assistance, and text generation. However, use of these models requires users to trust the model owners with their sensitive data given as input to the model. Fully Homomorphic Encryption (FHE) offers a promising solution, and many earlier works have investigated the use this technology for machine learning as a service (MLaaS) applications. Still, these efforts do not cater to generative models that operate probabilistically, allowing for diverse and creative outputs. In this work, we introduce three novel probabilistic selection algorithms for autoregressive generative AI: multiplication-scaled cumulative sum, heuristic cumulative sum, and the random-multiplication argmax. Each of these approaches presents distinctive challenges in optimizing the trade-off between precision and timing performance, a balance intricately tied to the specific characteristics of the data under consideration. Our results show that the random multiplication argmax-based method is more scalable than the cumulative sum methods and can accurately mimic the plaintext selection curve.
Fast pairings via biextensions and cubical arithmetic
Biextensions associated to line bundles on abelian varieties allows to reinterpret the usual Weil, Tate, Ate, optimal Ate, \ldots, pairings as monodromy pairings. We introduce a cubical arithmetic, derived from the canonical cubical torsor structure of these line bundles, to obtain an efficient arithmetic of these biextensions.
This unifies and extends Miller's standard algorithm to compute pairings along with other algorithms like elliptic nets and theta functions, and allows to adapt these algorithms to pairings on any model of abelian varieties with a polarisation $\Phi_D$, as long as we have an explicit theorem of the square for $D$.
In particular, we give explicit formulas for the arithmetic of the biextension (and cubical torsor structure) associated to the divisor $D=2(0_E)$ on an elliptic curve. We derive very efficient pairing formulas on elliptic curves and Kummer lines. Notably for generic pairings on Montgomery curves, our cubical biextension ladder algorithm to compute pairings costs only $15M$ by bits, which as far as I know is faster than any pairing doubling formula in the literature.
Enabling Complete Atomicity for Cross-chain Applications Through Layered State Commitments
Cross-chain Decentralized Applications (dApps) are increasingly popular for their ability to handle complex tasks across various blockchains, extending beyond simple asset transfers or swaps. However, ensuring all dependent transactions execute correctly together, known as complete atomicity, remains a challenge. Existing works provide financial atomicity, protecting against monetary loss, but lack the ability to ensure correctness for complex tasks. In this paper, we introduce Avalon, a transaction execution framework for cross-chain dApps that guarantees complete atomicity for the first time. Avalon achieves this by introducing multiple state layers above the native one to cache state transitions, allowing for efficient management of these state transitions. Most notably, for concurrent cross-chain transactions, Avalon resolves not only intra-chain conflicts but also addresses potential inconsistencies between blockchains via a novel state synchronization protocol, enabling serializable cross-chain execution. We implement Avalon using smart contracts in Cosmos ecosystem and evaluate its commitment performance, demonstrating acceptable latency and gas consumption even under conflict cases.
LEA Block Cipher in Rust Language: Trade-off between Memory Safety and Performance
Cryptography implementations of block cipher have been written in C language due to its strong features on system-friendly features. However, the C language is prone to memory safety issues, such as buffer overflows and memory leaks. On the other hand, Rust, novel system programming language, provides strict compile-time memory safety guarantees through its ownership model. This paper presents the implementation of LEA block cipher in Rust language, demonstrating features to prevent common memory vulnerabilities while maintaining performance. We compare the Rust implementation with the traditional C language version, showing that while Rust incurs a reasonable memory overhead, it achieves comparable the execution timing of encryption and decryption. Our results highlight Rust’s suitability for secure cryptographic applications, striking the balance between memory safety and execution efficiency.
Quantum Implementation of LSH
As quantum computing progresses, the assessment of cryptographic algorithm resilience against quantum attack gains significance interests in the field of cryptanalysis. Consequently, this paper implements the depth-optimized quantum circuit of Korean hash function (i.e., LSH) and estimates its quantum attack cost in quantum circuits. By utilizing an optimized quantum adder and employing parallelization techniques, the proposed quantum circuit achieves a 78.8\% improvement in full depth and a 79.1\% improvement in Toffoli depth compared to previous the-state-of art works.
In conclusion, based on the implemented quantum circuit, we estimate the resources required for a Grover collision attack and evaluate the post-quantum security of LSH algorithms.
Reducing Signature Size of Matrix-code-based Signature Schemes
This paper shows novel techniques to reduce the signature size of the code-based signature schemes MEDS and ALTEQ, by a large factor. For both schemes, the signature size is dominated by the responses for rounds with nonzero challenges, and we reduce the signature size by reducing the size of these responses. For MEDS, each of the responses consists of $m^2 + n^2$ field elements,while in our new protocol each response consists of only $2k$ ($k$ is usually chosen to be close to $m$ and $n$) field elements. For ALTEQ, each of the responses consists of $n^2$ field elements, while in our new protocol each response consists of about $\sqrt{2} n^{3/2}$ field elements. In both underlying $\Sigma$-protocols of the schemes, the prover generates a random isometry and sends the corresponding isometry to the verifier as the response. Instead of doing this, in our new protocols, the prover derives an isometry from some random code words and their presumed (full or partial) images. The prover sends the corresponding code words and images to the verifier as the response, so that the verifier can derive an isometry in the same way. Interestingly, it turns out that each response takes much fewer field elements to represent in this way.
Simpler and Faster BFV Bootstrapping for Arbitrary Plaintext Modulus from CKKS
Bootstrapping is currently the only known method for constructing fully homomorphic encryptions.
In the BFV scheme specifically, bootstrapping aims to reduce the error of a ciphertext while preserving the encrypted plaintext.
The existing BFV bootstrapping methods follow the same pipeline, relying on the evaluation of a digit extraction polynomial to annihilate the error located in the least significant digits.
However, due to its strong dependence on performance, bootstrapping could only utilize a limited form of plaintext modulus, such as a power of a small prime number.
In this paper, we present a novel approach to instantiate BFV bootstrapping, distinct from the previous digit extraction-based method.
The core idea of our bootstrapping is to utilize CKKS bootstrapping as a subroutine, so the performance of our method mainly depends on the underlying CKKS bootstrapping rather than the plaintext modulus.
We implement our method at a proof-of-concept level to provide concrete benchmark results.
When performing the bootstrapping operation for a 51-bits plaintext modulus, our method improves the previous digit extraction-based method by a factor of 37.9 in latency and 29.4 in throughput.
Additionally, we achieve viable bootstrapping performance for large plaintext moduli, such as 144-bits and 234-bits, which has never been measured before.
A Pure Indistinguishability Obfuscation Approach to Adaptively-Sound SNARGs for NP
We construct an adaptively-sound succinct non-interactive argument (SNARG) for NP in the CRS model from sub-exponentially-secure indistinguishability obfuscation ($i\mathcal{O}$) and sub-exponentially-secure one-way functions. Previously, Waters and Wu (STOC 2024), and subsequently, Waters and Zhandry (CRYPTO 2024) showed how to construct adaptively-sound SNARGs for NP by relying on sub-exponentially-secure indistinguishability obfuscation, one-way functions, and an additional algebraic assumption (i.e., discrete log, factoring, or learning with errors). In this work, we show that no additional algebraic assumption is needed and vanilla (sub-exponentially-secure) one-way functions already suffice in combination with $i\mathcal{O}$.
Separating Selective Opening Security From Standard Security, Assuming IO
Assuming the hardness of LWE and the existence of IO, we construct a public-key encryption scheme that is IND-CCA secure but fails to satisfy even a weak notion of indistinguishability security with respect to selective opening attacks. Prior to our work, such a separation was known only from stronger assumptions such as differing inputs obfuscation (Hofheinz, Rao, and Wichs, PKC 2016).
Central to our separation is a new hash family, which may be of independent interest. Specifically, for any $S(k) = k^{O(1)}$, any $n(k) = k^{O(1)}$, and any $m(k) = k^{\Theta(1)}$, we construct a hash family mapping $n(k)$ bits to $m(k)$ bits that is somewhere statistically correlation intractable (SS-CI) for all relations $R_k \subseteq \{0,1\}^{n(k)} \times \{0,1\}^{m(k)}$ that are enumerable by circuits of size $S(k)$.
We give two constructions of such a hash family. Our first construction uses IO, and generically ``boosts'' any hash family that is SS-CI for the smaller class of functions that are computable by circuits of size $S(k)$. This weaker hash variant can be constructed based solely on LWE (Peikert and Shiehian, CRYPTO 2019). Our second construction is based on the existence of a circular secure FHE scheme, and follows the construction of Canetti et al. (STOC 2019).
Stateless and Verifiable Execution Layer for Meta-Protocols on Bitcoin
The Bitcoin ecosystem has continued to evolve beyond its initial promises of decentralization, transparency, and security. Recent advancements have notably been made with the integration of Layer-2 solutions, which address scalability issues by offloading transactions from the main blockchain. This facilitates faster and more cost-effective transactions while maintaining integrity. The advent of inscriptions and ordinal protocols has further broadened the spectrum of capabilities, enabling the creation of unique, indivisible assets on the blockchain. Despite these technological strides, the inherent limitations of Bitcoin's script being Turing-incomplete restrict complex executions directly on the blockchain, necessitating the use of Bitcoin indexers. These indexers act as off-chain execution layers, allowing for the incorporation of Turing-complete programming languages to manage and update state transitions based on blockchain data. However, this off-chain solution introduces challenges to data integrity and availability, compounded by the decentralized nature of blockchain which complicates data maintenance and accuracy.
To address these challenges, we propose a new modular indexer architecture that enables a fully decentralized and user-verified network, mitigating the risks associated with traditional decentralized indexer networks susceptible to Sybil attacks. Our solution, INDECURE, leverages polynomial commitments as checkpoints to streamline the verification process, significantly reducing the overhead associated with integrity checks of state transitions. By implementing a robust data attestation procedure, INDECURE ensures the reliability of state information against malicious alterations, facilitating trustless verifications by users. Our preliminary evaluations of INDECURE across various indexer protocols—BRC20, Bitmap, and satsnames—demonstrate its superiority in reducing computation time and data block size while maintaining high integrity in state transitions. This modular approach not only enhances the security and efficiency of Bitcoin's off-chain executions but also sets a foundational layer for scalable, secure blockchain applications.
GAuV: A Graph-Based Automated Verification Framework for Perfect Semi-Honest Security of Multiparty Computation Protocols
Proving the security of a Multiparty Computation (MPC) protocol is a difficult task. Under the current simulation-based definition of MPC, a security proof consists of a simulator, which is usually specific to the concrete protocol and requires to be manually constructed, together with a theoretical analysis of the output distribution of the simulator and corrupted parties' views in the real world. This presents an obstacle in verifying the security of a given MPC protocol. Moreover, an instance of a secure MPC protocol can easily lose its security guarantee due to careless implementation, and such a security issue is hard to detect in practice.
In this work, we propose a general automated framework to verify the perfect security of instances of MPC protocols against the semi-honest adversary. Our framework has perfect soundness: any protocol that is proven secure under our framework is also secure under the simulation-based definition of MPC. We demonstrate the completeness of our framework by showing that for any instance of the well-known BGW protocol, our framework can prove its security for every corrupted party set with polynomial time. Unlike prior work that only focuses on black-box privacy which requires the outputs of corrupted parties to contain no information about the inputs of the honest parties, our framework may potentially be used to prove the security of arbitrary MPC protocols.
We implement our framework as a prototype. The evaluation shows that our prototype automatically proves the perfect semi-honest security of BGW protocols and B2A (binary to arithmetic) conversion protocols in reasonable durations.
"Act natural!": Having a Private Chat on a Public Blockchain
Messengers have become an essential means of interpersonal interaction. Yet untraceable private communication remains an elusive goal, as most messengers hide content, but not communication patterns. The knowledge of communication patterns can by itself reveal too much, as happened, e.g., in the context of the Arab Spring. Subliminal channels in cryptographic systems enable untraceable private communication in plain sight. In this context, bulletin boards in the form of blockchains are a natural object for subliminal communication: accessing them is innocuous, as they rely on distributed access for verification and extension. At the same time, blockchain users generate hundreds of thousands of transactions per day that are individually signed and placed on the blockchain. Thus, blockchains may serve as innocuous repository for publicly accessible cryptographic transactions where subliminal channels can be placed. This significantly increases the availability of publicly accessible cryptographic transactions where subliminal channels can be placed.
In this paper, we propose a public-key subliminal channel using secret-recoverable splittable signature schemes on blockchains and prove that our construction is undetectable in the random oracle model under common cryptographic assumptions. Our approach is applicable to any secret-recoverable splittable signature scheme and introduces a constant overhead of a single signature per message. Such schemes are used by 98 of the top 100 cryptocurrencies. We also analyze the applicability of our approach to the Bitcoin, Monero, and RippleNet networks and present proof of concept implementations for Bitcoin and RippleNet.
Understanding binary-Goppa decoding
This paper reviews, from bottom to top, a polynomial-time algorithm to correct t errors in classical binary Goppa codes defined by squarefree degree-t polynomials. The proof is factored through a proof of a simple Reed--Solomon decoder, and the algorithm is simpler than Patterson's algorithm. All algorithm layers are expressed as Sage scripts backed by test scripts. All theorems are formally verified. The paper also covers the use of decoding inside the Classic McEliece cryptosystem, including reliable recognition of valid inputs.
A More Compact AES, and More
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal resources, or potentially in a bit-sliced software implementation to increase speed.
Two-Round Threshold Signature from Algebraic One-More Learning with Errors
Threshold signatures have recently seen a renewed interest due to applications in cryptocurrency while NIST has released a call for multi-party threshold schemes, with a deadline for submission expected for the first half of 2025. So far, all lattice-based threshold signatures requiring less than two-rounds are based on heavy tools such as (fully) homomorphic encryption (FHE) and homomorphic trapdoor commitments (HTDC). This is not unexpected considering that most efficient two-round signatures from classical assumptions either rely on idealized model such as algebraic group models or on one-more type assumptions, none of which we have a nice analogue in the lattice world.
In this work, we construct the first efficient two-round lattice-based threshold signature without relying on FHE or HTDC. It has an offline-online feature where the first round can be preprocessed without knowing message or the signer sets, effectively making the signing phase non-interactive. The signature size is small and shows great scalability. For example, even for a threshold as large as 1024 signers, we achieve a signature size roughly 11 KB. At the heart of our construction is a new lattice-based assumption called the algebraic one-more learning with errors (AOMMLWE) assumption. We believe this to be a strong inclusion to our lattice toolkits with an independent interest. We establish the selective security of AOMMLWE based on the standard MLWE and MSIS assumptions, and provide an in depth analysis of its adaptive security, which our threshold signature is based on.
Climbing and descending tall volcanos
We revisit the question of relating the elliptic curve discrete logarithm problem (ECDLP) between ordinary elliptic curves over finite fields with the same number of points. This problem was considered in 1999 by Galbraith and in 2005 by Jao, Miller, and Venkatesan. We apply recent results from isogeny cryptography and cryptanalysis, especially the Kani construction, to this problem. We improve the worst case bound in Galbraith's 1999 paper from $\tilde{O}( q^{1.5} )$ to (heuristically) $\tilde{O}( q^{0.4} )$ operations.
The two cases of main interest for discrete logarithm cryptography are random curves (flat volcanoes) and pairing-based crypto (tall volcanoes with crater of constant or polynomial size). In both cases we show a rigorous $\tilde{O}( q^{1/4})$ algorithm to compute an isogeny between any two curves in the isogeny class. We stress that this paper is motivated by pre-quantum elliptic curve cryptography using ordinary elliptic curves, which is not yet obsolete.
Improved Multi-Party Fixed-Point Multiplication
Machine learning is widely used for a range of applications and is increasingly offered as a service by major technology companies. However, the required massive data collection raises privacy concerns during both training and inference. Privacy-preserving machine learning aims to solve this problem. In this setting, a collection of servers secret share their data and use secure multi-party computation to train and evaluate models on the joint data. All prior work focused on the scenario where the number of servers is two or three. In this work, we study the problem where there are $N \geq 3$ servers amongst whom the data is secret shared.
A key component of machine learning algorithms is to perform fixed-point multiplication with truncation of secret shared decimal values. In this work, we design new protocols for multi-party secure fixed-point multiplication where each of the $N$ parties have one share each of the two values to be multiplied and receive one share of the product at the end of the protocol. We consider three forms of secret sharing - replicated, Shamir, and additive, and design an efficient protocol secure in the presence of a semi-honest adversary for each of the forms. Our protocols are more communication efficient than all prior work on performing multi-party fixed-point multiplication. Additionally, for replicated secret sharing, we design another efficient protocol that is secure in the presence of a malicious adversary. Finally, we leverage our fixed-point multiplication protocols to design secure multi-party computation (MPC) protocols for arbitrary arithmetic circuits that have addition and fixed-point multiplication with truncation gates. All our protocols are proven secure using a standard simulation based security definition. Our protocols for replicated and Shamir sharing work in the presence of an honest majority of parties while the one for additive sharing can tolerate a dishonest majority as well.
Distributional Secure Merge
Secure merge refers to the problem of merging two sorted lists. The problem appears in different settings where each list is held by one of two parties, or the lists are themselves shared among two or more parties. The output of a secure merge protocol is secret shared. Each variant of the problem offers many useful applications.
The difficulty in designing secure merge protocols vis-a-vis insecure merge protocols (which work in linear time with a single pass over the lists) has to do with operations having to be oblivious or data-independent. In particular, the protocol cannot leak the positions of items of each list in the final merged list. On account of this, sorting-based secure merge protocols have been a common solution to the problem. However, as they introduce (poly)logarithmic overheads, there has been active investigation into the task of building (near) linear time secure merge protocols. Most recently, Hemenway et al. put forth a protocol for secure merge that does achieve linear communication and computation and a round complexity of $O({\log\log n})$, where $n$ is the length of the lists being merged. While this shows the feasibility of a linear time secure merge, it still leaves room for the design of a concretely efficient linear time secure merge.
In this work, we consider a relaxation of the problem where the lists are uniformly random. We show a secure merge protocol for uniformly random lists that achieves $O({n\log\log n})$, i.e., near linear communication and computation and a round complexity of $O({\log\log n})$, where $n$ is the length of the lists being merged. Our protocol design is general and can be instantiated in a variety of settings so long as the building blocks (basic ones such as comparisons and shuffles) can be realized in said settings. Although we do not achieve the same asymptotic guarantees as Hemenway et al., our work is concretely efficient. We implement our protocol and compare it to the state of the art sorting protocols and demonstrate an order of magnitude improvement in running times and communication for lists of size of $2^{20}$.
We also extend our protocol to work for lists sampled from arbitrary distributions. In particular, when the lists are (close to) identically distributed, we achieve the same efficiency as uniform lists. This immediately improve the performance of many crucial applications including PSI & Secure Join, thus illustrating the significance and applicability of our protocol in practice.
Message Latency in Waku Relay with Rate Limiting Nullifiers
Waku is a privacy-preserving, generalized, and decentralized messaging protocol suite. Waku uses GossipSub for message routing and Rate Limiting Nullifiers (RLN) for spam protection. GossipSub ensures fast and reliable peer-to-peer message delivery in a permissionless environment, while RLN enforces a common publishing rate limit using zero-knowledge proofs.
This paper presents a practical evaluation of message propagation latency in Waku. First, we estimate latencies analytically, building a simple mathematical model for latency under varying conditions. Second, we run a large-scale single-host simulation with 1000 nodes. Third, we set up a multi-host Waku deployment using five nodes in different locations across the world. Finally, we compare our analytical estimations to the results of the simulation and the real-world measurement.
The experimental results are in line with our theoretical model. Under realistic assumptions, medium-sized messages (25 KB) are delivered within 1 second. We conclude that Waku can achieve satisfactory latency for typical use cases, such as decentralized messengers, while providing scalability and anonymity.
Exploiting Internal Randomness for Privacy in Vertical Federated Learning
Vertical Federated Learning (VFL) is becoming a standard collaborative learning paradigm with various practical applications. Randomness is essential to enhancing privacy in VFL, but introducing too much external randomness often leads to an intolerable performance loss. Instead, as it was demonstrated for other federated learning settings, leveraging internal randomness —as provided by variational autoencoders (VAEs) —can be beneficial. However, the resulting privacy has never been quantified so far, nor has the approach been investigated for VFL.
We therefore propose a novel differential privacy (DP) estimate, denoted as distance-based empirical local differential privacy (dELDP). It allows us to empirically bound DP parameters of models or model components, quantifying the internal randomness with appropriate distance and sensitivity metrics. We apply dELDP to investigate the DP of VAEs and observe values up to ε ≈ 6.4 and δ = 2−32. Based on this, to link the dELDP parameters to the privacy of VAE-including VFL systems in practice, we conduct comprehensive experiments on the robustness against state-of-the-art privacy attacks. The results illustrate that the VAE system is robust against feature reconstruction attacks and outperforms other privacy-enhancing methods for VFL, especially when the adversary holds 75% of the features during label inference attacks.
A Study of Partial Non-Linear Layers with DEFAULT and BAKSHEESH
In this work, we take a look at the two recently proposed block ciphers, DEFAULT and BAKSHEESH, both of which are descendent of another block cipher named GIFT. We show that both ciphers can be interpreted within the partial non-linear layer category, thanks to the SBoxes having at least one non-trivial linear structure. We also reevaluate the security claim of DEFAULT.
From Interaction to Independence: zkSNARKs for Transparent and Non-Interactive Remote Attestation
Remote attestation (RA) protocols have been widely
used to evaluate the integrity of software on remote devices.
Currently, the state-of-the-art RA protocols lack a crucial feature: transparency. This means that the details of the final
attestation verification are not openly accessible or verifiable by
the public. Furthermore, the interactivity of these protocols often
limits attestation to trusted parties who possess privileged access
to confidential device data, such as pre-shared keys and initial
measurements. These constraints impede the widespread adoption
of these protocols in various applications.
In this paper, we introduce zRA, a non-interactive, transparent, and publicly provable RA protocol based on zkSNARKs.
zRA enables verification of device attestations without the need
for pre-shared keys or access to confidential data, ensuring a
trustless and open attestation process. This eliminates the reliance
on online services or secure storage on the verifier side. Moreover,
zRA does not impose any additional security assumptions beyond
the fundamental cryptographic schemes and the essential trust
anchor components on the prover side (i.e., ROM and MPU).
The zero-knowledge attestation proofs generated by devices have
constant size regardless of the network complexity and number
of attestations. Moreover, these proofs do not reveal sensitive
information regarding internal states of the device, allowing verification by anyone in a public and auditable manner. We conduct
an extensive security analysis and demonstrate scalability of zRA
compared to prior work. Our analysis suggests that zRA excels
especially in peer-to-peer and Pub/Sub network structures. To
validate the practicality, we implement an open-source prototype
of zRA using the Circom language. We show that zRA can be
securely deployed on public permissionless blockchains, serving
as an archival platform for attestation data to achieve resilience
against DoS attacks.
Compact Issuer-Hiding Authentication, Application to Anonymous Credential
Anonymous credentials are cryptographic mechanisms enabling users to authenticate themselves with a fine-grained control on the information they leak in the process. They have been the topic of countless papers which have improved the performance of such mechanisms or proposed new schemes able to prove ever-more complex statements about the attributes certified by those credentials. However, whereas these papers have studied in depth the problem of the information leaked by the credential and/or the attributes, almost all of them have surprisingly overlooked the information one may infer from the knowledge of the credential issuer.
In this paper we address this problem by showing how one can efficiently hide the actual issuer of a credential within a set of potential issuers. The novelty of our work is that we do not resort to zero-knowledge proofs but instead we show how one can tweak Pointcheval-Sanders signatures to achieve this issuer-hiding property at a very low cost. This results in an efficient anonymous credential system that indeed provide a complete control of the information leaked in the authentication process. Our construction is moreover modular and can then fit a wide spectrum of applications, notably for Self-Sovereign Identity (SSI) systems.
Decentralized Multi-Client Functional Encryption with Strong Security
Decentralized Multi-Client Functional Encryption (DMCFE) extends the basic functional encryption to multiple clients that do not trust each other. They can independently encrypt the multiple plaintext-inputs to be given for evaluation to the function embedded in the functional decryption key, defined by multiple parameter-inputs. And they keep control on these functions as they all have to contribute to the generation of the functional decryption keys. Tags can be used in the ciphertexts and the keys to specify which inputs can be combined together. As any encryption scheme, DMCFE provides privacy of the plaintexts. But the functions associated to the functional decryption keys might be sensitive too (e.g. a model in machine learning). The function-hiding property has thus been introduced to additionally protect the function evaluated during the decryption process.
In this paper, we provide new proof techniques to analyze a new concrete construction of function-hiding DMCFE for inner products, with strong security guarantees in the random oracle model: the adversary can adaptively query multiple challenge ciphertexts and multiple challenge keys, with unbounded repetitions of the same message tags in the ciphertext-queries and a fixed polynomially-large number of repetitions of the same key tags in the key-queries, allowing static corruption of the secret encryption keys. Previous constructions were proven secure in the selective setting only.
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the best of our knowledge, this is the first protected hardware implementation of ASCON with respect to side-channel and fault inject protection. The side-channel and fault protections work orthogonal to each other (i.e., either one can be turned on/off without affecting the other). We present ASIC and FPGA benchmarks for all our implementations (hash and AEAD) with/without countermeasures for varying input sizes.
Efficient Lattice-Based Threshold Signatures with Functional Interchangeability
A threshold signature scheme distributes the ability to generate signatures through distributed key generation and signing protocols. A threshold signature scheme should be functionally interchangeable, meaning that a signature produced by a threshold scheme should be verifiable by the same algorithm used for non-threshold signatures. To resist future attacks from quantum adversaries, lattice-based threshold signatures are desirable. However, the performance of existing lattice-based threshold signing protocols is still far from practical.
This paper presents the first lattice-based $t$-out-of-$n$ threshold signature scheme with functional interchangeability that has been implemented. To build an $t$-out-of-$n$ access structure for arbitrary $t \leq n$, we first present a novel $t$-out-of-$n$ version of the SPDZ MPC protocol. We avoid using the MPC protocol to evaluate hash operations for high concrete efficiency. Moreover, we design an efficient distributed rejection sampling protocol. Consequently, the online phase of our distributed signing protocol takes only 0.5 seconds in the two-party setting and 7.3 seconds in the 12-party setting according to our implementation. As a byproduct, our scheme also presents a periodic key refreshment mechanism and offers proactive security.
Beware Your Standard Cells! On Their Role in Static Power Side-Channel Attacks
Static or leakage power, which is especially prominent in advanced technology nodes, enables so-called static power side-channel attacks (S-PSCA). While countermeasures exist, they often incur considerable overheads. Besides, hardware Trojans represent another threat. Although the interplay between static power, down-scaling of technology nodes, and the vulnerability to S-PSCA is already established, an important detail was not covered yet: the role of the components at the heart of this sensitive interplay, the standard cells.
Here, we study this intricate relationship for two commercial 28nm and 65nm technologies, using a commercial-grade IC design setup, and under realistic PPA objectives. Specifically, we study how threshold-voltage (VT) tuning of standard cells impacts the resilience of representative AES and PRESENT cipher hardware, including versions with established countermeasures. Our proposed CAD framework enables a security-vs-PPA-aware design-space exploration. Contrary to the belief that high-performance designs are generally more vulnerable to S-PSCA, we find that timing constraints and the distribution of different VT cells are more pivotal factors. Furthermore, we discover that attackers can deploy highly effective and stealthy S-PSCA-based Trojans, all without any gate overheads or any timing violations.
SECDSA: Mobile signing and authentication under classical ``sole control''
The 2014 European eIDAS regulation regulates strong electronic authentication and legally binding electronic signatures. Both require user "sole control". Historically smartcards are used based on direct interaction between user and relying party. Here sole control is provided by giving users both physical possession and control of the cryptographic key used for signing/authentication through a PIN.
Such **classical** sole control is required in the 1999 electronic signature directive by some interpretations.
The eIDAS regulation repeals the directive and explicitly relaxes its sole control requirements in a trade-off between security and usability.
This allows user interaction to be outsourced to intermediary parties (authentication providers, signing services). This also allows mobile applications as user friendly alternatives for smartcards. However, current mobile platforms are only equipped with limited cryptographic hardware not supporting secure knowledge factors (PINs) controlling keys. The eIDAS relaxation raises concerns on sole control; intermediary parties should not be able to act as man-in-the-middle and impersonate users. In this paper we present a simple cryptographic design for signing and authentication on standard mobile platforms providing classical sole control. We argue that our design can meet the highest eIDAS requirements, effectively introducing a new signature category in a 2016 decision of the European Commission.
We also sketch a SECDSA based implementation of the European Digital Identity Wallet recently proposed by the European Commission as part of the eIDAS regulation update.
$\textsf{Asterisk}$: Super-fast MPC with a Friend
Secure multiparty computation$~$(MPC) enables privacy-preserving collaborative computation over sensitive data held by multiple mutually distrusting parties. Unfortunately, in the most natural setting where a majority of the parties are maliciously corrupt$~$(also called the $\textit{dishonest majority}$ setting), traditional MPC protocols incur high overheads and offer weaker security guarantees than are desirable for practical applications. In this paper, we explore the possibility of circumventing these drawbacks and achieving practically efficient dishonest majority MPC protocols with strong security guarantees by assuming an additional semi-honest, non-colluding helper party $\mathrm{HP}$. We believe that this is a more realistic alternative to assuming an honest majority, since many real-world applications of MPC involving potentially large numbers of parties$~$(such as dark pools) are typically enabled by a central governing entity that can be modeled as the $\mathrm{HP}$.
In the above model, we are the first to design, implement and benchmark a practically-efficient and general multi-party framework, $\textsf{Asterisk}$. Our framework requires invoking $\mathrm{HP}$ only a constant number of times, achieves the strong security guarantee of $\textit{fairness}$ (either all parties learn the output or none do), scales to hundreds of parties, outperforms all existing dishonest majority MPC protocols, and is, in fact, competitive with state-of-the-art honest majority MPC protocols. Our experiments show that $\textsf{Asterisk}$ achieves $228-288\times$ speedup in preprocessing as compared to the best dishonest majority MPC protocol. With respect to online time, $\textsf{Asterisk}$ supports $100$-party evaluation of a circuit with $10^6$ multiplication gates in approximately $20$ seconds. We also implement and benchmark practically efficient and highly scalable dark pool instances using $\textsf{Asterisk}$. The corresponding run times showcase the effectiveness of $\textsf{Asterisk}$ in enabling efficient realizations of real-world privacy-preserving applications with strong security guarantees.
When NTT Meets SIS: Efficient Side-channel Attacks on Dilithium and Kyber
In 2022, NIST selected Kyber and Dilithium as post-quantum cryptographic standard algorithms. The Number Theoretic Transformation (NTT) algorithm, which facilitates polynomial multiplication, has become a primary target for side-channel attacks. In this work, we embed the NTT transformation matrix in Dilithium and Kyber into the SIS search problem, and further, we propose a divide and conquer strategy for dimensionality reduction of the SIS problem by utilizing the properties of NTT, and discuss the effectiveness of the BKZ algorithm for solving the problem by using the LLL and with different blocksize, respectively. When using BKZ-60, the time required to recover private keys $\mathbf{s}_1$ for Dilithium2 after using the dimensionality reduction strategy is reduced from 82 hours to 1 minute, which is a 4,900$\times$ improvement, and the minimum number of coefficients required is reduced from 65 to 32, which is close to the theoretical lower limit value of 28. Furthermore, we propose a parameter-adjustable CPA scheme to expedite the recovery of a single coefficient in NTT domain. Combining this CPA scheme with the SIS-assisted approach, we executed practical attacks on both unprotected and masked implementations of Dilithium and Kyber on an ARM Cortex-M4. The results demonstrate that, using 5,000 power traces, we can recover complete $\mathbf{s}_1$ of Dilithium2 in 2.4 minutes, which achieve a 400$\times$ speedup compared to the best-known attacks. And Kyber512 takes only 0.5 minutes, a 7.5$\times$ improvement over what's already working. Moreover, we successfully break the first-order masked implementations and explore the potential applicable to higher-order implementations.
Regev Factoring Beyond Fibonacci: Optimizing Prefactors
In this note, we improve the space-efficient variant of Regev's quantum factoring algorithm [Reg23] proposed by Ragavan and Vaikuntanathan [RV24] by constant factors in space and/or size. This allows us to bridge the significant gaps in concrete efficiency between the circuits by [Reg23] and [RV24]; [Reg23] uses far fewer gates, while [RV24] uses far fewer qubits.
The main observation is that the space-efficient quantum modular exponentiation technique by [RV24] can be modified to work with more general sequences of integers than the Fibonacci numbers. We parametrize this in terms of a linear recurrence relation, and through this formulation construct three different circuits for quantum factoring:
- A circuit that uses $\approx 10.4n$ qubits and $\approx 45.7n^{1/2}$ multiplications of $n$-bit integers.
- A circuit that uses $(9+\epsilon)n$ qubits and $O_\epsilon(n^{1/2})$ multiplications of $n$-bit integers, for any $\epsilon > 0$.
- A circuit that uses $(24+\epsilon)n^{1/2}$ multiplications of $n$-bit integers, and $O_\epsilon(n)$ qubits, for any $\epsilon > 0$.
In comparison, the original circuit by [Reg23] uses at least $\approx 3n^{3/2}$ qubits and $\approx 6n^{1/2}$ multiplications of $n$-bit integers, while the space-efficient variant by [RV24] uses $\approx 10.32n$ qubits and $\approx 129.6n^{1/2}$ multiplications of $n$-bit integers (although a very simple modification of their Fibonacci-based circuit uses $\approx 11.32n$ qubits and only $\approx 86.4n^{1/2}$ multiplications of $n$-bit integers). The improvements proposed in this note take effect for sufficiently large values of $n$; it remains to be seen whether they can also provide benefits for practical problem sizes.