Exceptions Explained Part 3: libunwind

Last time I mentioned libunwind, but now I’m going to go over the implementation of libunwind in GCC. As mentioned before, libunwind used to be a library developed by HP, but it got vendored into GCC as a part of libgcc. libgcc itself is a small library containing, besides libunwind, various intrinsics for floating point and integer math.

libunwind

One might reasonably ask the question of how one can unwind a stack while using functions that modify the stack? Moreover, how can libunwind be cross-platform when it clearly needs to reference architecture-specific registers? The answer is that register state is stored in a struct entirely separately from the real registers. “Unwinding” in libunwind is simply updating the register state in that struct. This struct is known as the unwind context, and is the central data structure behind libunwind. When unwinding is finished, the context is “installed” into the cpu via architecture-specific instructions that read the struct into the registers.

API

As mentioned before, libunwind exposes:

_Unwind_RaiseException
_Unwind_Resume
_Unwind_DeleteException

However, it also exposes the functions:

_Unwind_GetGR
_Unwind_SetGR
_Unwind_GetIP
_Unwind_SetIP
_Unwind_GetRegionStart
_Unwind_GetLanguageSpecificData
_Unwind_ForcedUnwind

The GR functions are for referencing general registers, while the IP functions reference the instruction pointer. Both are used when jumping to the launch pad, as well as general unwinding. GetRegionStart is used to find the beginning of the function, which is necessary when parsing the LSDA.

Control Flow

When _Unwind_RaiseException is called, it first creates the context and starts “Phase 1” as described in the Itanium ABI. It iterates through the call stacks using uw_frame_state_for to load the function frame until a function with a personality routine is found. At that point, “Phase 2” is executed. Phase 2 checks the stop handler, defined if _Unwind_ForcedUnwind is called, then checks if there is a personality. If the personality exists and wishes to stop, then the current context is installed. Otherwise, unwinding resumes.

_Unwind_Resume is similar to _Unwind_RaiseException, but doesn’t create a new exception object.

uw_frame_state_for is responsible for finding the CIE and FDE debug information mentioned last time and actually updating the information in the context. Like a responsible employee, it naturally delegates finding the FDE to _Unwind_Find_FDE, then uses execute_cfa_program to actually run the DWARF instructions.

_Unwind_Find_FDE

Naively, you might think that there’s no better way to find the FDE than a plain linear search. After all, the descriptors are variable length and not even sorted. In memory-constrained environments, this is the case. However, GCC was pleasant enough to find a way to construct a btree out of the FDE’s to make lookup logarithmic. Thanks to the contributions of some database architects and a GCC contributor, this btree is also under a rw lock, meaning it scales across threads. The reason why this is a data structure constructed at program start¹ is because of the presence of shared libraries. Exception information for shared libraries must be added and removed from the btree when they are loaded and unloaded², meaning that a rw-lock is necessary.

execute_cfa_program

execute_cfa_program is a very long program dedicated to parsing DWARF instructions. Fortunately there is only a small subset of instructions used in FDEs, so it’s not that bad. DWARF instructions used in unwinding are prefixed with DW_CFA for DWARF Canonical Frame Address instructions. The instructions are as such³:

DW_CFA_advance_loc: increments the instruction pointer
DW_CFA_advance_loc1: similar to above, but uses a 1 byte argument
DW_CFA_advance_loc2: same as above, but 2 bytes
DW_CFA_advance_loc4: 4 bytes
DW_CFA_offset: sets a register to an offset of the stack (represents pushing a register)
DW_CFA_offset_extended: similar to above, takes uLEB values instead
DW_CFA_restore: restore a register
DW_CFA_restore_extended: similar to above, takes uLEB values
DW_CFA_set_loc: set the instruction pointer to a value
DW_CFA_undefined: the register value is not defined by the caller
DW_CFA_same_value: the register value is the same as before
DW_CFA_register: the register is moved to another register
DW_CFA_remember_state / DW_CFA_restore_state: push and pop the register state off a stack. This is apparently used when parts of the epilogue is in the body of a function.
DW_CFA_def_cfa: The frame size is an offset of a register and a constant
DW_CFA_def_cfa_register: This changes the frame size register
DW_CFA_def_cfa_offset: This changes the offset of the frame size
DW_CFA_nop: does nothing

These instructions are executed until the current instruction pointer is reached. After that, they are used to restore the registers as described by the instructions.

Criticism

My major criticisms are with the usage of DWARF and the two-phase setup. DWARF is non-trivial to decode, and definitely does not belong in any performance-critical code. As a reminder, this has to happen every single function called until a catch block is found. This is worsened by the fact that libunwind utilizes a two-phase system for no discernable reason. Libunwind unwinds the stack once, calls the personality routine to ask if they want to stop, then unwinds again, then asks the personality routine again if they want to do anything.

There is no reason why the personality routine couldn’t be called just once. In fact, that is literally what I did. In AVR exceptions, I have replaced the GCC personality routine and libunwind with my own implementation that does not call itself twice. The difference in code size is literally an order of magnitude.

See __do_global_ctors1 in crtstuff.c, ↩︎
See unwind-dw2-fde-dip.c ↩︎
See https://dwarfstd.org/doc/dwarf-2.0.0.pdf ↩︎

libunwind#

API#

Control Flow#

_Unwind_Find_FDE#

execute_cfa_program#

Criticism#

libunwind

API

Control Flow

_Unwind_Find_FDE

execute_cfa_program

Criticism