Preprocessing¶
Many protocols in MP-SPDZ use preprocessing, that is, producing secret shares that are independent of the actual data but help with the computation. Due to the independence, this can be done in batches to save communication rounds and even communication when using homomorphic encryption that works with large vectors such as LWE-based encryption.
Generally, preprocessing is done on demand and per computation
threads. On demand means that batches of preprocessing data are
computed whenever there is none in storage, and a computation thread
is a thread created by control flow instructions such as
for_range_multithread()
.
The exceptions to the general rule are edaBit generation with malicious security and AND triples with malicious security and honest majority, both when using bucket size three. Bucket size three implies batches of over a million to achieve 40-bit statistical security, and in honest-majority binary computation the item size is 64, which makes the actual batch size 64 million triples. In multithreaded programs, the preprocessing is run centrally using the threads as helpers.
The batching means that the cost in terms of time and communication
jump whenever another batch is generated. Note that, while some
protocols are flexible with the batch size and can thus be controlled
using -b
, others mandate a batch size, which can be as large as a
million.
Separate preprocessing¶
It is possible to separate out the preprocessing from the
input-dependent (“online”) phase. This is done by either option -F
or -f
on the virtual machines. In both cases, the preprocessing
data is read from files, either all data per type from a single file
(-F
) or one file per thread (-f
). The latter allows to use
named pipes.
The file name depends on the protocol and the computation domain. It
is generally <prefix>/<number of players>-<protocol
shorthand>-<domain length>/<preprocessing type>-<protocol
shorthand>-P<player number>[-T<thread number>]
. For example, the
triples for party 1 in SPDZ modulo a 128-bit prime can be found in
Player-Data/2-p-128/Triples-p-P1
. The protocol shorthand can be
found by calling <share type>::type_short()
. See
Share Types for a description of the share types.
Preprocessing files start with a header describing the protocol and computation domain to avoid errors due to mismatches. The header is as follows:
Length to follow (little-endian 8-byte number)
Protocol descriptor
Domain descriptor
MAC if applicable
The protocol descriptor is defined by <share
type>::type_string()
. For SPDZ modulo a prime it is SPDZ gfp
.
The domain descriptor depends on the kind of domain:
- Modulo a prime
Serialization of the prime
Sign bit (0 as 1 byte)
Length to follow (little-endian 4-byte number)
Prime (big-endian)
- Modulo a power of two:
Exponent (little-endian 4-byte number)
- \(GF(2^n)\)
Storage size in bytes (little-endian 8-byte number). Default is 16.
\(n\) (little-endian 4-byte number)
As an example, the following output of hexdump -C
describes SPDZ
modulo the default 128-bit prime
(170141183460469231731687303715885907969):
00000000 2d 00 00 00 00 00 00 00 53 50 44 5a 20 67 66 70 |-.......SPDZ gfp|
00000010 00 10 00 00 00 80 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 1b 80 01 3a ed c2 28 c0 3d 5e 24 8f 2c a5 |.....:..(.=^$.,.|
00000030 9b d6 2d 83 12
The last 128 bits denote the MAC and will differ from instance to instance. The MAC is stored to avoid errors that are hard to track otherwise.
The actual data is stored is by simple concatenation. For example,
triples are stored as repetitions of a, b, ab
, and daBits are
stored as repetitions of a, b
where a
is the arithmetic
share and b
is the binary share.
For protocols with MAC, the value share is stored before the MAC share.
Values are generally stored in little-endian order. Note the following domain specifics:
- Modulo a prime
Values are stored in Montgomery representation with \(R\) being the smallest power of \(2^{64}\) larger than the prime. For example, \(R = 2^{128}\) for a 128-bit prime. Furthermore, the values are stored in the smallest number of 8-byte blocks necessary, all in little-endian order. As an example, consider the default 128-bit prime \(p = 170141183460469231731687303715885907969\). The Montgomery representation of \(x\) is \(xR \bmod p\). For \(x = 1\), this is 170141183460469231731687303715882303487 or 0x7fffffffffffffffffffffffffe47fff in hexadecimal. Using to little-endian,
hexdump -C
would output the following:ff 7f e4 ff ff ff ff ff ff ff ff ff ff ff ff 7f
- Modulo a power of two:
Values are stored in the smallest number of 8-byte blocks necessary, all in little-endian order, so 1 with a modulus of \(2^{64}\) would result in the following
hexdump -C
output:01 00 00 00 00 00 00 00
- \(GF(2^n)\)
Values are stored in blocks according to the storage size above, all in little-endian order.
For further details, have a look at Utils/Fake-Offline.cpp
, which
contains code that generates preprocessing data insecurely for a range
of protocols (underlying the binary Fake-Offline.x
).
{mascot,cowgear,mal-shamir}-offline.x
generate
sufficient preprocessing data for a specific high-level program with
MASCOT, CowGear, and malicious Shamir secret sharing, respectively.