Exceptions
For the past couple months, I’ve been trying to get exceptions to work on AVR as a hobby project.
After several months of work, I’ve done it. Exceptions work, you can just include <vector>
and it
works, and it works. I can’t say it’s really worth it, and in retrospect there’s already limited STL
libraries that I should have looked into
as prior art, but those don’t use exceptions anyways.
How to use it
Currently I put prebuilt toolchains on github. It works via a wrapper script called avr-g++.sh, which calls necessary unwind data generation and linking for you. My changes are currently a bunch of patches to gcc 13.2.0, binutils, and a few others.
What I actually did
In a nutshell, I implemented a replacement for libunwind on avr. For those who don’t know, libunwind is a library that allows you to unwind the stack and create stack traces. It’s also built into libgcc, and is the principle mechanism behind exceptions in C++.
Why libunwind doesn’t work
At first, I tried porting actually libunwind to AVR. The problem is that AVR is a very limited platform, and actually porting libunwind results in a 12k binary. On a microprocessor that might have 32k in flash. Using one third of flash for an error-handling mechanism that the user may not have even asked for seems kind of bad, so naturally I went ahead and reimplemented the whole thing in assembly.
Why is libunwind big
I suspect the reason why libunwind is so large is due to two major factors. First, the ABI is defined in 32-bit sized words. AVR is a 8-bit architecture. Naturally, this means 4 loads, 4 registers, and 4 adds are required to even do basic arithmetic on AVR with a 32-bit int. This is kind of degenerate, but is necessary all the time because the libunwind binary assumes 32-bit architectures.
Secondly, libunwind only lazily unwinds the stack. It actually parses the stack, updating a struct that holds the unwound state. When time comes to unwind, special architecture-specific functions are called to actually load the state into registers. This is convenient to avoid writing everything in C, but also means that libunwind must manipulate various structs (of which there are many) during unwinding. On normal processors this is fine, but on AVR this is actually somewhat costly due to the aforementioned issues with 32-bit integers.
Why is my solution better
My solution parses a bytecode that specifies how to unwind a frame, then parses a compiler generated LSDA section to see if handlers need to be called. The bytecode is 8-bit sized, which avoids the issues with 32-bit integers. It is also intentionally incredibly simplistic, in comparison to libunwind, which piggy-backs off of DWARF debug info. The issue with DWARF is that it is a general purpose debug information format, meaning the parser needs to be far more complex than really necessary. Even assuming you use a subset of DWARF, DWARF is also 32-bit aligned, which causes issues with AVR.
My solution ends up being about 2k in size, with 0.8k being for the unwinding and 1.2k being for parsing and LSDA. I haven’t done benchmarks given that I never got libunwind to work, but it’s safe to say that my implementation probably would have been faster too.
Implications for other platforms
My implementation only really is better because of the limitations of AVR, and many of the advantages are lessened on more capable platforms. That said, ARM actually utilizes a similar framework in the ARM EH ABI, which uses a similar bytecode.
Future Work
My implementation of the bytecode is created via a separate program, but obviously if I’d want to upstream it I’d have to integrate it with GCC. I also currently keep all unwinding information, but this can be improved by pruning functions that never throw. The unwinding table can also probably be optimized with address-based lookups or something like that, but that would require writing a linker plugin and probably extending the current linker plugin interface.
Using this on Arduino
Unfortunately using this on windows Arduino has proven difficult, since the way they launch programs does not allow scripts to be used. I’ve tried getting around this via ps2exe, but the way it parses arguments means that doesn’t work. I suspect that it might work for arduino on Linux, but I don’t really have the means to test this myself.