The simpler alternative to GCC-RS

Sergey "Shnatsel" Davidoff
6 min readMay 30, 2021

--

The GCC-RS project, which can be summed up as “Rewrite the Rust compiler in C++”, got a bit of media attention lately. In this post I’ll try to convince you that all the stated benefits of it can be achieved without necessitating a rewrite by leveraging rustc_codegen_gcc instead.

All of the opinions expressed in this article are my own. They do not represent the opinions of any organizations I may be part of.

I am not directly affiliated with any of the projects discussed here.

Background: LLVM vs GCC

The official Rust compiler currently uses LLVM for code generation. LLVM is an open-source compiler and code generation library competing with GCC, the other major open-source compiler stack.

As a code generator, GCC has several advantages over LLVM:

  1. GCC can produce code that runs 10% or so faster on some x86 hardware (but not all x86 hardware), at least when compiling C and C++
  2. GCC supports more CPU architectures. LLVM already supports all desktop or server-grade CPUs manufactured in the last 15 years, but GCC also supports some hobbyist retrocomputing architectures, such as HP PA.

So it would make sense to allow using GCC as the code generator when compiling Rust programs.

Why GCC-RS?

GCC-RS intends not only to use GCC for code generation, but also reimplement the entire rest of the Rust compiler from scratch, in C++.

For reference, the official Rust compiler is written in Rust. Rewriting Rust code in C++ seems a bit backwards. So why are they doing this?

The FAQ for the project lists the following benefits of GCC-RS:

Support for more CPU architectures

That’s true! But the official Rust compiler is not nailed down to LLVM. In fact, it supports pluggable code generation backends. (If you’re not sure what code generation is or what other parts are there in the Rust compiler, read this).

And rustc_codegen_gcc, you guessed it, simply plugs GCC into the existing Rust compiler as a code generation backend. This allows compiling code for all the architectures supported by GCC, without rewriting the entire rest of the compiler from scratch.

Cross-language LTO

In order to use link-time optimization (LTO) across C and Rust, you need to use the same code generation stack in both C and Rust. Aside of producing smaller binaries and slightly faster code, LTO is also a prerequisite for CFI, a new exploit mitigation technique.

However, this also would work perfectly fine with rustc_codegen_gcc.

And besides, cross-language LTO is already possible with the LLVM backend, provided you’re using the LLVM-based Clang compiler for C code. Firefox now uses it in production on all platforms.

The GCC-RS FAQ lists Linux as the motivating example. Ironically, Linux supports LTO with LLVM but not GCC!

GCC Plugins

Existing GCC plugins such as those in the Linux Kernel project should be reusable since they target GIMPLE in the middle-end.

Even ignoring how weird and niche this use case is, rustc_codegen_gcc also emits GIMPLE and would work just as well.

Bootstrapping

The Rust compiler is written in Rust. That presents a problem for CPU architectures that don’t have a Rust compiler built for them yet. It’s a chicken-and-egg problem, and resolving it is called “bootstrapping”.

To bootstrap a C compiler, typically you’d write a super simple C compiler in assembly, which you use to compile a bit more advanced C compiler written in C, which you then use to compile an early version of GCC, use that to compile a slightly newer GCC, and so on until you catch up to the latest version.

If you need C++ (and latest GCC is written in C++, so you do need it), you do the same trick and use a simple C++ compiler written in C to get the chain going. Same for any other language, really.

The Rust bootstrap chain is quite long because you need to get from C to OCaml and then compile pre-release Rust to compile Rust 1.0 to compile Rust 1.1 to compile Rust 1.2 and so on until you catch up to 1.53 (or whatever the latest version is when you’re reading this). So if you can have a Rust compiler written in C++ that compiles 1.53 directly, you can save yourself some time.

So GCC-RS could help with this, right?

Not really. In reality you only need to walk the entire chain on one architecture. Then you can use your fully-functional Rust compiler on e.g. x86 to build a compiler for ARM, HP PA or whatever else you might need. This is called cross-compilation, and is fully supported by Rust.

And shortening the chain on one architecture is a solved problem.

You see, you don’t need the full-blown compiler with all the validation and error messages and whatnot, you just need it to compile things correctly. That’s what mrustc is: a minimal Rust compiler written in C++ designed for bootstrapping and nothing else. It lets you bootstrap from C++ on x86 and cross-compile to any architecture from there.

Other considerations

As you can see, every single benefit that GCC-RS lists can be provided by rustc_codegen_gcc and mrustc, without rewriting the compiler from scratch and at dramatically lower development and maintenance costs.

But what if they forgot to include some crucial benefit in the FAQ? Here are some other points people bring up in relation to GCC-RS:

Isn’t having multiple implementations a good thing?

Well, maybe? It didn’t work out for C/C++, but perhaps we can learn from that and do better. Still, the benefits of this are rather nebulous and I’m not convinced that they justify the costs.

Wouldn’t having an alternative implementation help specify the language?

Yes, that’s what miri is for. You feed it some Rust code and it tells you if it’s valid and whether your unsafe code triggers any undefined behavior or not.

Isn’t Rust vulnerable to the Ken Thompson hack?

No, it is not. The “trusting trust” problem is already solved by mrustc.

libgccjit.so is annoying to package for Linux!

Speaking as a former Debian maintainer — yes, it is mildly annoying, but it has to be done anyway, so GCC-RS doesn’t help here.

rustc_codegen_gcc relies on MIR which is unstable!

Keeping up with the changes to MIR is much easier than keeping up with the changes to the entire language. And that’s ignoring the enormous up-front investment that a full compiler rewrite would entail.

Supporting multiple GCC versions in rustc_codegen_gcc will be difficult!

Yeah, so just don’t do it! Every release of GCC-RS targets a single specific GCC version to avoid this issue. rustc_codegen_gcc could trivially do the same.

Doesn’t GCC-RS reuse the borrow checker written in Rust?

Not the production-ready borrow checker, but the experimental one.

But yes, it does! They’ve reused 5,000 lines of Rust. Only 465,000 lines to go!

I hear GCC-RS has funding!

Yes, one full-time developer and a part-time project manager for one year. For rewriting the entire Rust compiler from scratch, that’s underwhelming.

The company providing the funding mentioned that they’ve failed to get anyone else interested in funding GCC-RS. Coincidence? I think not!

Conclusion

I believe the rewrite of Rust compiler in C++ that the GCC-RS project is attempting is completely unjustified. The gargantuan effort required to make it a reality would be better spent elsewhere.

These projects will provide all the listed benefits at a dramatically lower cost:

Ultimately, GCC-RS might provide some value that I’m not seeing. But if you care about portability to obscure platforms, language specification or bootstrapping time, I encourage you to support one of these projects rather than GCC-RS. They should provide a far greater return on investment.

I’m putting my money where my mouth is and will be supporting rustc_codegen_gcc on Github Sponsors starting in June.

--

--