Adding an Instruction¶
If you want to add functionality that isn’t captured by the current virtual machine design, you might need to add further instructions. This section explains how the virtual machine is built on both the frontend (Python) and the backend (C++).
Frontend¶
The instructions are defined as classes in
../Compiler/instructions.py
and
../Compiler/GC/instructions.py
. Every class requires the
attributes opcode
and arg_format
to be
set. Consider the example of prefixsums
:
@base.vectorize
class prefixsums(base.Instruction):
""" Prefix sum.
:param: result (sint)
:param: input (sint)
"""
__slots__ = []
code = base.opcodes['PREFIXSUMS']
arg_format = ['sw','s']
opcode
is set from opcodes
in
../Compiler/instructions_base.py
. This is simply for
convenience as it allow copying from the C++ code (see below). The
only requirement for opcodes is that they are unique 10-bit integers.
arg_format
has to be iterable over strings indicating the
nature of arguments. In the example, sw
indicates that a secret
integer register is written to, and s
indicates that a secret
integer register is read from. ArgFormats
defines all
possible argument types:
ArgFormats = {
'c': ClearModpAF,
's': SecretModpAF,
'cw': ClearModpAF,
'sw': SecretModpAF,
'cg': ClearGF2NAF,
'sg': SecretGF2NAF,
'cgw': ClearGF2NAF,
'sgw': SecretGF2NAF,
'ci': ClearIntAF,
'ciw': ClearIntAF,
'*': AnyRegAF,
'*w': AnyRegAF,
'i': ImmediateModpAF,
'ig': ImmediateGF2NAF,
'int': IntArgFormat,
'long': LongArgFormat,
'p': PlayerNoAF,
'str': String,
'varstr': VarString,
}
The values of this dictionary are classes defined in
../Compiler/instructions_base.py
which can encode them
into bytes to written to the bytecode files. Most types are encoded as
four-byte values except long
, which uses eight bytes, and
varstr
, which has a variable length. The type classes also define
functionality to check arguments for correctness such as Python type.
By default, register arguments are understood as single registers. The
vectorize
decorator is an easy way to allow vector arguments
if all register arguments have the same length. The vector size is
stored independently of the arguments. The decorator creates two
instructions, a base version for single registers and a vectorized
version, which is called as follows in the example for length
n
:
vprefixsums(n, result, operand)
At the higher level, the vector length is usually derived from the
input using the vectorize
decorator as in
prefix_sum()
:
@vectorize
def prefix_sum(self):
""" Prefix sum. """
res = sint()
prefixsums(res, self)
return res
All instruction classes should inherit from Instruction
in
../Compiler/instructions_base.py
.
Backend¶
The backend functionality has three aspects:
Parsing the bytecode and creating an internal representation
Figuring out the resource requirements of the instruction (registers and memory)
Execution
Parsing¶
The internal representation is done via the Instruction
class defined in ../Processor/Instruction.h
. The arguments
are parsed in parse_operands()
defined in
../Processor/Instruction.hpp
. It contains a large switch
statement covering all opcodes. Sticking to the example of
prefixsums
, the relevant code there
is as follows:
case PREFIXSUMS:
...
get_ints(r, s, 2);
break;
This puts the two integer values corresponding to the two arguments
into r[0]
and r[1]
within the Instruction
object. r
is an array of four 32-bit integers, which is
enough for many simple instructions. More complex instruction use
start
, which is a variable-length C++ vector of 32-bit
integers.
Resourcing¶
Because the number of registers depends on the programs, the virtual
machine has to find out the requirements for every single
instruction. The main function for this is get_max_reg()
in
../Processor/Instruction.hpp
, which returns the maximum
register that is written to for a particular register type. It
contains two switch statements. The first one contains special
treatment for instructions that write to more than one register type
such as dabit
. However, for most
instruction including prefixsums
,
it checks the type currently queried against the type defined by
get_reg_type()
and returns 0 if there is a
mismatch. get_reg_type()
makes use of the fact that the
opcodes are grouped. For prefixsums
, it returns SINT
,
which is the default.
The second switch statement then treats further special cases
where start
is used or r contains registers of
different types. None of this applies for prefixsums
, so
the return value is simply the maximum over r
and
start
plus the vector size:
unsigned res = 0;
for (auto x : start)
res = max(res, (unsigned)x);
for (auto x : r)
res = max(res, (unsigned)x);
return res + size;
Execution¶
Execution is again defined by several switch statements over the
opcode, the outermost of which is in Program::execute()
defined in ../Processor/Instruction.hpp
. It uses the X
macro pattern for a compact
representation. prefixsums
is
implemented in ../Processor/instructions.h
as follows:
X(PREFIXSUMS, auto dest = &Procp.get_S()[r[0]]; auto op1 = &Procp.get_S()[r[1]]; \
sint s, \
s += *op1++; *dest++ = s) \
This macro has three arguments: the opcode, the setup step, and vector
loop step. The setup step involves getting pointers according to the
register addresses r[0]
and r[1]
as well as initializing a
running variable. The loop step then adds the next input element to
the running variable and stores in the destination.
Another important switch statement is in
Instruction::execute
. See Execution for further
examples.