In Part 1, I presented some benchmarks (see here) for my brainfuck implementation against some of the best. I'll now lay out the reference interpreter, the optimized interpreter, and source code.
(I'm slower posting all this than I meant to be: apologies --- dayjob and all...)
To start, the reference interpreter looks like this:
[updated: fixed missing header]
Executables for Windows and Linux.
Its pretty basic --- pretty much add/subtract, move left/right, get/put, and then loop begin/end. It basically is as basic as you can get, though it include three things that are, only in the loosest sense, "optimizations":
- Although the interpreter will ignore non-valid characters at runtime, they are actually stripped at parse time.
- I accumulate operations into a counter at parse time (e.g. ++++ becomes +4)
- I parse the jump locations for loops at parse time.
These are so simple to do inline that it feels like they hardly qualify, and you'll see all 3 in pretty much every implementation (I'd guess).
And presented below is the optimized implementation that I presented in Part 1:
Note that it is NOT materially different.
The main differences are:
- The collapse of the command and "helper" arrays
- Double-instructioning -- whereby it is assumed that to do anything interesting in brainfuck, you always have to move the pointer, so every instruction has a move-the-pointer offset (saves on instruction decodes)
- The inclusion of the "super instructions" that perform the following common tasks:
- Pointer shifting in a loop
- Multiplication of a cell by a constant
- Setting a cell to a value
- Multiplication of a cell by a constant, then zeroing
- Multiplying two cells
The first 2 are pretty straighforward -- most of the work is in the third item: collapsing the existing instructions into the optimized "super" instructions without breaking any contracts. The primary optimizations are around loop invariance.
There are a LOT more optimizations I didn't yet finish tackling -- dead code elimination and constant propagation, in particular -- that should yield materially better performance.
I'll post the details and source to the optimizing implementation in a future post.
(Oh, btw, forget to mention -- if you use the "-c" option with the optimized implementation from part 1, it will convert your brainfuck program to somewhat performant C code.)