Working on a Custom Backend for Odin

Ginger Bill — March 9, 2017

For the past few months, Odin has been using LLVM as its backend (with Microsoft's Linker) to compile to machine code (and before that I compiled to C). It has done its job whilst the language has grown to what it is now.

As a continuing project for Odin, we are going to create a new backend. The backend will use a form of Static Single Assignment (SSA) to do the majority of its optimizations, which can be lowered to a generic byte code. From this generic byte code, it can be further specialized to the needed machine architecture (e.g. amd64, mips64, x86, etc) or even execute it with an interpreter.

You might be thinking we are crazy to try and replace LLVM (you might be right :D) but LLVM has been a huge problem for this compiler and language. The main problems being:

LLVM is slow - It takes up 85%+ of the total compilation time; even for non-optimized code. Why not have a different backend which fast to compile but may not be very fast compiled code?
LLVM's design is restrictive - It was designed for C-like languages in mind so if your language deviates from this, it is annoying to handle
LLVM is buggy - numerous bugs have been encountered with LLVM which I have had to work around for the mean time; many due to design flaws which can never be fixed, and some that have existed for years.
LLVM is very big dependency
A compile time execution stage needs a byte code to execute. Why have two things that do a very similar thing?

These are the brief reasonings as to why I am doing this. LLVM is causing me more problems that it is worth and I need different solution.

It should be noted that this is not a replacement for the LLVM backend but an alternative which the user will be free to choose from. (The compiler may have numerous backends eventually to choose from (e.g. C, C++, LLVM, another language)).

We will try and keep all of you informed about the progress of this new backend.

If you want to help with the development with this language and/or compiler, you are always welcome to by asking on the forums, git repository, or by email: odin (at) gingerbill (dot) org.

- Bill

Comments

Leave a Comment

Jeremiah Goerdt

March 9, 2017

Ooooh, exciting news. That's going to be a great project for everyone to follow.

What language will you start with?

Ginger Bill

March 9, 2017

amd64 architecture

ratchetfreak

March 9, 2017

FYI it looks like the people on the llvm mailing list just now encountered the _m64 parameter passing bug

https://groups.google.com/forum/#!topic/llvm-dev/uP2lEsQAgpI

Mārtiņš Možeiko

March 9, 2017 Edited by Mārtiņš Možeiko on March 9, 2017, 6:40pm

Why do you use and care about _m64? That's MMX which is pretty obsoleted, unless you are targeting 32-bit code for 15y old systems.

ratchetfreak

March 9, 2017

looking at it properly it's more of a sign that they didn't exercise the call lowering in relation to the various conventions enough.

Ginger Bill

March 10, 2017

This isn't just restricted to 32 bit systems but 64 bit too. LLVM passes structs/arrays/vectors through the stack even if they can fit into a register. See this bug report from 2013: http://lists.llvm.org/pipermail/llvm-dev/2013-January/058147.html

They argue that this a problem with cte-dev and not llvm-dev but I'm thinking that cte-dev are working around a fundamental problem that llvm-dev have not addressed.

The example I had was with passing a {f32, f32} as an argument to an foreign function. With C's ABI on windows, that should be passed as an u64 however, llvm was passing this variable through the stack which was causing the issue. I would have expected LLVM, being quite high-level, to do this automatically for me.

This is just one of many problems I've had with calling conventions in LLVM.

Mārtiņš Možeiko

March 10, 2017 Edited by Mārtiņš Možeiko on March 10, 2017, 5:45pm

I may be misunderstanding something, but you cannot choose arbitrary ABI on your own when you call foreign functions.

ABI is strictly define by platform/language. Sure maybe you want it to work differently (pack two floats in u64 register?), but how receiving function will know that? It will expect arguments to be passed according to agreed ABI which I guess in this cases says how struct by value should be passed.

LLVM simply generates code with ABI that platform expects. That's exactly how it should be.

Ginger Bill

March 10, 2017

Sorry to confuse you but I understand about the ABI it's just LLVM doesn't seem to generate the correct ABI. I will write out an example later to show you what is going wrong but this is pretty much the problem:

1 2	declare void @foo1({float, float}) declare void @foo2(i64)

The foreign function is defined line this in C:

struct Vec2 {
    float x, y;
};
void foo1(struct Vec2 v);
void foo2(i64 x);

Passing it clang, it generates this:

1 2	declare void @foo1(i64) declare void @foo2(i64)

So why is the first written example wrong? Why is it not correctly passing the struct like an i64?

Mārtiņš Možeiko

March 10, 2017

Hmm, it doesn't for me. For me clang compiles it to declare void @foo1(<2 x float>). See here: https://godbolt.org/g/HvbGUd
And if I compile it to x86_64 asm, it passes both floats in one xmm0 register: https://godbolt.org/g/LFv9Zx

Ginger Bill

March 10, 2017

That's weird. Are you compiling on *nix? Either way, it's not the same data structure. Clang converts it to a different type which means LLVM isn't handling it properly for the specific architecture.

Mārtiņš Možeiko

March 10, 2017

Yes, on Linux.
What do you mean by "different type" ? clang is lowering types to whatever target supports. LLVM is not architecture independent bytecode. It depends on platform ABI. What does "isn't handling it properly for the specific architecture" mean? LLVM handles in whatever way the platform ABI requires it to handle.

anaël seghezzi

March 20, 2017

There is a new bytecode for the Haxe language, maybe it can be a source of inspiration :
http://hashlink.haxe.org/