Well, I guess I settled on that last addressing mode. It's two more
modes, which pretty much just copy the existing
immediate-memory-location and something-from-the-stack addressing
Again, my terminology might be way off, but I'm going to call it
Syntax in the assember for it is **address, or *$stack_offset.
Here's what it's doing in code for the "fetch" part. (Excuse the
crappy temporary error handling for now.)
And here's a little example snippet of assembly demonstrating it.
The ability to add in-line, immediate data was also added in the
last couple of revisions. There are assembly directives to have the
next few Words be some explicit values, or to fill the next 'n'
Words with some specific value.
I think the incomplete VM code might almost be not-embarrassing
enough to toss up GitHub soon.
Next up: Profiling and comparisons against Lua and native code. Then
taking out some of the modulo crap and trying it again.
To test the existing functionality of LilyVM and its assembler, I
decided to implement a simple, well-known cryptography algorithm,
RC4. It's not the
kind that anyone should use today, because it's got a lot of known
effective attacks against it, but it's an easy real-world example of
a simple algorithm to play with.
I also like to use it as a random number generator.
Lessons learned tonight:
Some more addressing modes to handle things like de-reference
pointers without separate load/store instructions would make
things MUCH more concise. I think I can add more addressing modes
without any overhead.
I thought about adding a "swap" instruction because it's done
twice in this example. Still not sure about this one.
I need better debugging tools! (You'd think inspecting the state
of your own VM would be easy.)
The code is getting a bit messy and needs a cleanup pass.
I really really really need a way to define arbitrary data. In
this example I used instructions in place of data just so I had
data to work with on the algorithm. Something to just occupy a
block of some number of Words, literal numbers, and possibly
literal strings would go a long way here.
Emacs's asm-mode is barely suitable for this. Maybe I just need to
get used to it.
I ended up using the stack as though it was just a big series of
registers. I guess it's how local variables really would be used
normally, so maybe it's not so bad.
Being able to create labels that map to arbitrary values instead
of just memory locations could have replaced a lot of the
arbitrarily-numbered stack positions.
I lack sufficient error handling. I want to avoid excessive error
checking, and I want to run in places where exceptions are
disabled. (I'm looking at you, Unreal 4.) I might have to resort
to the evil black magic that is setjmp()/longjmp().
I have need function calls. I have no kind of calling convention
or anything. There was only a little bit of duplicated code here,
the RC4 implementation I made a long time ago that I tested it
The actual code follows below. Excuse the lack of syntax
highlighting. I guess I need to make the syntax a little more like
some common assembler syntax to use an off-the-shelf syntax
Disclaimer: I am not experienced at this, so I might use the wrong
terminology, or have some totally stupid ideas. The point is to
learn by doing, and sometimes that involves failing and looking
Now having said that...
Last night I started working on a project that I've been toying with
the idea of for a while.
I want to make a virtual machine that meets all the following
Portable, with no weird dependencies. Should just compile with my
utilities library and standard C++. Should run on every platform
supported by C++.
Sandboxable. Should be able to run untrusted code. (Lua is not
built for this.)
Really fast. As fast as possible without just going and writing a
JIT. (Don't want to deal with individual platform weirdness just
Pause/resume capable. (Something DerpScript is not.)
Can save and restore VM state. (Something Lua cannot do with any
level of sanity.)
A new LLVM backend targeted to it, or an assembler that can read
LCC's intermediate assembly representation in a similar manner to
And for these goals:
Scripting for games, allowing fast iteration time and dynamic
reloading of game logic. The execution time must be fast to be
suitable for this.
User-programmable game scripts that can safely be run on a
multiplayer server, even though they're written by potentially
hostile players. We need sandboxing, suspend/resume support, and
the ability to take and restore state snapshots for this to work
General purpose scripting. Maybe get it integrated with that
texture generator tool I was working on. For this, it needs a sane
API and the ability to hold references to data outside of the VM.
I don't know how I'm going to handle the latter part, and the way
I did it in DerpScript isn't going to cut it.
So after one night of work I have the start of the VM, a mostly
complete assembler (not using LCC's assembly), and a disassembler.
At the moment it's possible to write some very simple assembly
programs and do basic flow control and memory access. Only the "add"
instruction has been implemented on the arithmetic side.
The specs of the VM are (right now):
No addressable general purpose registers. Everything is just
direct memory access. There's a program counter and stack pointer,
and that's it at the moment. So far there is no way to modify them
directly. Push/pop instructions exist as a way to modify the stack
pointer, with no direct access (yet). Jump and branch instructions
modify the program counter, but there's no way to read it (yet).
Three addressing modes: Immediate, direct memory, and indirect
with an immediate value as an offset from the stack pointer. There
are additional instructions for dereferencing pointers in memory.
(This could all change, because I'd rather have the
dereference-pointer ops be replaced with addressing modes built
into the opcodes.)
Memory is divided into 32-bit Words. Each instruction with encoded
addressing modes takes up exactly one Word, plus one Word per
parameter. This can change with a modification to a typedef to
change the meaning of Word, but there's a minimum usable size.
8-bit Words won't work just because the encoded instructions won't
fit in them.
That's all I have right now. I'll add more information as I piece it
together. I haven't done a good job of making the code presentable,
so no source code or example code yet. But maybe soon.