Skip to main content

Posts

Modifying GCC

Quite a while ago, I’ve integrated my exceptions work into a fork of GCC. At that point, I was already modifying GCC to suppress the libunwind symbols, so putting all of my code into GCC was a no brainer. Moreover, it isn’t actually that hard to build GCC.

Editing GCC

GCC is built via the autoconf build system. Autoconf is a collection of scripts that is somewhat like CMake, but written entirely in shell script and is intended to be portable for any system that runs shell scripts and has a C compiler. Autoconf is used to adapt programs for all unix types, and as such assumes very little about the host platform. In any case, I originally based my build script off of ZakKemble’s build script, although eventually I would deviate quite significantly from it.

Learning how to compile GCC took a few days, and at first I built GCC in a docker container every single time. Eventually though, I figured out it was better to just compile it locally, and rerun the makefile to update changes. Then every commit I test in a dockerfile to make sure my changes are portable. Autoconf, by virtue of being a shell script, is horrifically slow, but there’s not much I can do about that.

Moreover, GCC is very large. I typically use vscode for code editing, but I find that the large amount of files does not fit in vscode’s side bar. Because of that, I’ve grown accustomed to using a bare sublime text editor by itself. Programming with no IDE or autocomplete is surprisingly not that bad, but means that I’ve had to dive into GCC internal documentation often to understand programs. Notably, I find GCC internal documentation to be surprisingly complete, far more complete than Clang’s.

libgcc

As you may already know, I’ve implemented the unwind functions in AVR assembly. In order to move my work to GCC, the first task was to move my unwind routines into GCC. Oddly enough, libunwind functions are actually placed inside libgcc, a basic core library most programs are linked with, instead of libsupc++, the basic support library for C++. This is because the unwind mechanism is (meant to be) language and compiler agnostic, and theoretically an exception thrown from one langauge can be caught in another. There is code in GCC for catching java and ada exceptions, but I suspect nobody actually uses these. In any case, moving my code to libgcc is as simple as putting it in the AVR-specific folder inside and specifying the target file to compile my code.

Frame data emission

I was investigating how GCC emits exception frame data, when I found out that GCC does not actually emit the data itself. Instead, oddly enough, it is done by the assembler. GCC emits .cfi_* directives, which the assembler then turns into .eh_frame sections. This means that I had to not only fork GCC, but also binutils in order to replace my exception information emitter.

Fortunately, it seems ARM had already beat me to the punch. ARM as it turns out converged on a similar design for their ehabi, which also uses unwind instructions. In any case, I found the code they used to emit arm unwinding instructions, and replaced it with my own directives. Similarly to ARM, my code emits a function start and end directive, then directives for popping registers and allocating stack space. That said, there was a hiccup when interpreting the stack adjustment. Because AVR registers are 8-bit, there normally wouldn’t be enough bits to represent meaningful amounts of stack or instructions. Because of that, both the stack pointer and instruction pointer registers are 16-bit, but need to be read as two IO registers instead of a normal register. This causes the RTL emitted for stack adjustments to be as a read, modify, write rather than a normal pointer adjustment.

After modifying both GCC and binutils, my code just worked. I was able to unwind instructions, just as before. Moreover, because I was forking binutils already, I was able to modify the base linker script for AVR, meaning that there was no longer any need to use a special linker script. Because of that, my GCC fork works with exceptions out of the box.

Crosstool-NG

Several months later, I was looking at how to build GCC when I rediscovered crosstool-NG. I didn’t try it earlier due to a lack of experience, but I figured crosstool-NG was better than my current ad-hoc build script. As it turns out, crosstool-NG is really easy if you’ve already compiled GCC manually before. You specify compile flags, certain options you want, and it just works (on linux, no experience with anything else). It takes care of downloading all of the dependencies, and you can specify it to use your own git repo or to apply patches if you’re modifying GCC. I did have to modify the script lightly, but since it’s a shell script that wasn’t too bad.

What now?

As it turns out, modifying compiler’s isn’t that hard. I’ll probably work on porting the build to Windows and MacOS so I can release this for Arduino users, but other than that I’ll probably be done with AVR. After this, I’ll probably do exception optimization research. I’m not too sure I’ve mentioned this before, but I’m friends with Khalil Estell, whose done research on exception handling optimization. He’s independently (he’s been doing this research before I got to know him) been researching this, and has been optimizing exceptions on ARM. I’ll probably try to do some exception optimization of my own on ARM, forking either GCC again or Clang. In particular, I want to do link time exception optimizations.

Exceptions

For the past couple months, I’ve been trying to get exceptions to work on AVR as a hobby project. After several months of work, I’ve done it. Exceptions work, you can just include <vector> and it works, and it works. I can’t say it’s really worth it, and in retrospect there’s already limited STL libraries that I should have looked into as prior art, but those don’t use exceptions anyways.

How to use it

Currently I put prebuilt toolchains on github. It works via a wrapper script called avr-g++.sh, which calls necessary unwind data generation and linking for you. My changes are currently a bunch of patches to gcc 13.2.0, binutils, and a few others.

What I actually did

In a nutshell, I implemented a replacement for libunwind on avr. For those who don’t know, libunwind is a library that allows you to unwind the stack and create stack traces. It’s also built into libgcc, and is the principle mechanism behind exceptions in C++.

Why libunwind doesn’t work

At first, I tried porting actually libunwind to AVR. The problem is that AVR is a very limited platform, and actually porting libunwind results in a 12k binary. On a microprocessor that might have 32k in flash. Using one third of flash for an error-handling mechanism that the user may not have even asked for seems kind of bad, so naturally I went ahead and reimplemented the whole thing in assembly.

Why is libunwind big

I suspect the reason why libunwind is so large is due to two major factors. First, the ABI is defined in 32-bit sized words. AVR is a 8-bit architecture. Naturally, this means 4 loads, 4 registers, and 4 adds are required to even do basic arithmetic on AVR with a 32-bit int. This is kind of degenerate, but is necessary all the time because the libunwind binary assumes 32-bit architectures.

Secondly, libunwind only lazily unwinds the stack. It actually parses the stack, updating a struct that holds the unwound state. When time comes to unwind, special architecture-specific functions are called to actually load the state into registers. This is convenient to avoid writing everything in C, but also means that libunwind must manipulate various structs (of which there are many) during unwinding. On normal processors this is fine, but on AVR this is actually somewhat costly due to the aforementioned issues with 32-bit integers.

Why is my solution better

My solution parses a bytecode that specifies how to unwind a frame, then parses a compiler generated LSDA section to see if handlers need to be called. The bytecode is 8-bit sized, which avoids the issues with 32-bit integers. It is also intentionally incredibly simplistic, in comparison to libunwind, which piggy-backs off of DWARF debug info. The issue with DWARF is that it is a general purpose debug information format, meaning the parser needs to be far more complex than really necessary. Even assuming you use a subset of DWARF, DWARF is also 32-bit aligned, which causes issues with AVR.

My solution ends up being about 2k in size, with 0.8k being for the unwinding and 1.2k being for parsing and LSDA. I haven’t done benchmarks given that I never got libunwind to work, but it’s safe to say that my implementation probably would have been faster too.

Implications for other platforms

My implementation only really is better because of the limitations of AVR, and many of the advantages are lessened on more capable platforms. That said, ARM actually utilizes a similar framework in the ARM EH ABI, which uses a similar bytecode.

Future Work

My implementation of the bytecode is created via a separate program, but obviously if I’d want to upstream it I’d have to integrate it with GCC. I also currently keep all unwinding information, but this can be improved by pruning functions that never throw. The unwinding table can also probably be optimized with address-based lookups or something like that, but that would require writing a linker plugin and probably extending the current linker plugin interface.

Using this on Arduino

Unfortunately using this on windows Arduino has proven difficult, since the way they launch programs does not allow scripts to be used. I’ve tried getting around this via ps2exe, but the way it parses arguments means that doesn’t work. I suspect that it might work for arduino on Linux, but I don’t really have the means to test this myself.

Blog Infrastructure

As is tradition, I’ve hosted the blog manually using AWS. Last time I created configured an Apache server manually. However, as I grew more knowledgeable about web development, I realized that this was a woefully out-dated solution. So here’s my retelling of how I created this blog.

Hugo

To begin with, I began by using Hugo as my static website generator. I’ve had enough experience with it that using it was relatively painless. There were a few tripping points along the way. The theme documentation referred to a config.yaml while hugo preferred hugo.yaml. As it turns out config.yaml was transitioned away from, and only one of the two should exist. Also, the hugo module needed to be downloaded, otherwise the website silently breaks. This can be done by using hugo mod get. The module also must be referred to by its module name and not its actual name in the hugo.yaml.

Hosting

Previously I had hosted a blog using an Apache server. Fundamentally there’s no issue with that, but renting a server 24/7 is very expensive. Because of that, there are more specialized solutions that have a cost that scales with usage, rather than hogging a server instance 24/7.

Storage

I’ve already had experience with setting up cloud storage with amazon, so this is more of a light skim. Amazon’s storage service is Simple Storage Service, or S3. Objects in S3 are put into buckets, which are basically filesystems. Inside buckets are objects and directories full of objects, which can be downloaded and uploaded using the AWS cli or external tools. I use rclone for S3 upload.

S3 Static Hosting

At first, I found documentation on hosting a server on S3 directly. It required setting bucket permissions to public, which allows anybody with the URL to ‘GetObject’ the contents of the bucket. This works, but suffers from lack of CDN or HTTPS support. For this reason, I found the S3 direct hosting method to be unsuitable.

CloudFront

This leads into CloudFront, AWS’s Content Distribution Network service. For reference, a CDN refers to a set of “edge” servers that holds a cache of your website. When accessing your website, content comes from a server near the client rather than from all the way from the origin server. This has the additional benefit of allowing you to hide your origin, protecting it from DOS attacks.

Cloudfront has “distributions,” each of which cache one service you have. I created a distribution that pointed towards an S3 bucket that contains the built website. It also had to be configured to have the alternate domain name blog.adl-developments.com. A Cloudfront function is also used to redirect people from adl-developments.com to blog.adl-developments.com

Domain Registration

AWS does everything, including domain name registration. Because closed gardens are convenient, AWS makes it very easy to point the registered domain towards cloudfront. AWS calls its service Route 53, and domains are referred to as “hosted zones.” The rules for how traffic is directed are called “records.” I used A alias records to direct adl-developments.com and similar subdomains to the cloudfront service, which can redirect and service as necessary.

SSL Certificate

Of course, in order to use HTTPS we need a certified SSL certificate to verify who we are. Of course, Amazon once again provides the AWS Certificate Manager, who provides certificates signed by Amazon that certifies who we are. ACM is provided free of charge, although the certificates only work with other AWS services. Also, the certificates won’t show any other information other than “signed by Amazon.”

Continuous Integration

Previously, I simply had a script build the website and upload it to S3. That said, I wanted to try out Continuous Integration, which is where after uploading code to a repository, it is automatically built, tested, and deployed.

Github Actions

Github provides a CI service called Github Actions. Actions run, depending on configuration, after a push or pull request. Here, I use some third party libraries in order to build and deploy the website automatically.

The action needs access to AWS and Hugo, so I use aws-actions/configure-aws-credentials and peaceiris/actions-hugo to setup access. Initially I used an AWS user with an access key, which I replaced later. Then I simply ran the shell commands to build the website and copy the public directory to S3. It then tells cloudfront to invalidate and update its cache.

OpenID

This solution however, requires storing a long-term key in Github’s secrets. While not necessarily bad, one can do better. Fortunately, github is an OpenID provider, and can verify your CI pipeline to AWS. To use this, you tell AWS to accept Github as an OpenID Connect identity provider, then create a role with permissions to upload to S3 and create invalidations on CloudFront. You also need to give the CI pipeline access to its own authentication token.

Conclusion

That said, I can say safely that this is the most modern blog I’ve ever developed. Unfortunately it’s mostly constrained to AWS’s walled garden, but I imagine that with enough effort it can be ported to Cloudflare or some other service with relative ease. It also took a very long time, but like most infrastructure should require relatively little maintenance.

TLDR

I put the website on S3, distribute it on CloudFront, then deploy it using Github Actions.

First

At this point, this is my third time writing a blog, and the first since I have become an adult. Since I now have a semi-stable source of income, I can actually rent a domain name and server resources with confidence that I can pay for it in the future.

In any case, I’ll probably post stuff about personal projects I’ve done. First I should put some of my more recent projects though.