Last time I mentioned libunwind, but now I’m going to go over the implementation of libunwind in GCC. As mentioned before, libunwind used to be a library developed by HP, but it got vendored into GCC as a part of libgcc. libgcc itself is a small library containing, besides libunwind, various intrinsics for floating point and integer math.

libunwind

One might reasonably ask the question of how one can unwind a stack while using functions that modify the stack? Moreover, how can libunwind be cross-platform when it clearly needs to reference architecture-specific registers? The answer is that register state is stored in a struct entirely separately from the real registers. “Unwinding” in libunwind is simply updating the register state in that struct. This struct is known as the unwind context, and is the central data structure behind libunwind. When unwinding is finished, the context is “installed” into the cpu via architecture-specific instructions that read the struct into the registers.

API

As mentioned before, libunwind exposes:

  • _Unwind_RaiseException
  • _Unwind_Resume
  • _Unwind_DeleteException

However, it also exposes the functions:

  • _Unwind_GetGR
  • _Unwind_SetGR
  • _Unwind_GetIP
  • _Unwind_SetIP
  • _Unwind_GetRegionStart
  • _Unwind_GetLanguageSpecificData
  • _Unwind_ForcedUnwind

The GR functions are for referencing general registers, while the IP functions reference the instruction pointer. Both are used when jumping to the launch pad, as well as general unwinding. GetRegionStart is used to find the beginning of the function, which is necessary when parsing the LSDA.

Control Flow

When _Unwind_RaiseException is called, it first creates the context and starts “Phase 1” as described in the Itanium ABI. It iterates through the call stacks using uw_frame_state_for to load the function frame until a function with a personality routine is found. At that point, “Phase 2” is executed. Phase 2 checks the stop handler, defined if _Unwind_ForcedUnwind is called, then checks if there is a personality. If the personality exists and wishes to stop, then the current context is installed. Otherwise, unwinding resumes.

_Unwind_Resume is similar to _Unwind_RaiseException, but doesn’t create a new exception object.

uw_frame_state_for is responsible for finding the CIE and FDE debug information mentioned last time and actually updating the information in the context. Like a responsible employee, it naturally delegates finding the FDE to _Unwind_Find_FDE, then uses execute_cfa_program to actually run the DWARF instructions.

_Unwind_Find_FDE

Naively, you might think that there’s no better way to find the FDE than a plain linear search. After all, the descriptors are variable length and not even sorted. In memory-constrained environments, this is the case. However, GCC was pleasant enough to find a way to construct a btree out of the FDE’s to make lookup logarithmic. Thanks to the contributions of some database architects and a GCC contributor, this btree is also under a rw lock, meaning it scales across threads. The reason why this is a data structure constructed at program start1 is because of the presence of shared libraries. Exception information for shared libraries must be added and removed from the btree when they are loaded and unloaded2, meaning that a rw-lock is necessary.

execute_cfa_program

execute_cfa_program is a very long program dedicated to parsing DWARF instructions. Fortunately there is only a small subset of instructions used in FDEs, so it’s not that bad. DWARF instructions used in unwinding are prefixed with DW_CFA for DWARF Canonical Frame Address instructions. The instructions are as such3:

  • DW_CFA_advance_loc: increments the instruction pointer
  • DW_CFA_advance_loc1: similar to above, but uses a 1 byte argument
  • DW_CFA_advance_loc2: same as above, but 2 bytes
  • DW_CFA_advance_loc4: 4 bytes
  • DW_CFA_offset: sets a register to an offset of the stack (represents pushing a register)
  • DW_CFA_offset_extended: similar to above, takes uLEB values instead
  • DW_CFA_restore: restore a register
  • DW_CFA_restore_extended: similar to above, takes uLEB values
  • DW_CFA_set_loc: set the instruction pointer to a value
  • DW_CFA_undefined: the register value is not defined by the caller
  • DW_CFA_same_value: the register value is the same as before
  • DW_CFA_register: the register is moved to another register
  • DW_CFA_remember_state / DW_CFA_restore_state: push and pop the register state off a stack. This is apparently used when parts of the epilogue is in the body of a function.
  • DW_CFA_def_cfa: The frame size is an offset of a register and a constant
  • DW_CFA_def_cfa_register: This changes the frame size register
  • DW_CFA_def_cfa_offset: This changes the offset of the frame size
  • DW_CFA_nop: does nothing

These instructions are executed until the current instruction pointer is reached. After that, they are used to restore the registers as described by the instructions.

Criticism

My major criticisms are with the usage of DWARF and the two-phase setup. DWARF is non-trivial to decode, and definitely does not belong in any performance-critical code. As a reminder, this has to happen every single function called until a catch block is found. This is worsened by the fact that libunwind utilizes a two-phase system for no discernable reason. Libunwind unwinds the stack once, calls the personality routine to ask if they want to stop, then unwinds again, then asks the personality routine again if they want to do anything.

There is no reason why the personality routine couldn’t be called just once. In fact, that is literally what I did. In AVR exceptions, I have replaced the GCC personality routine and libunwind with my own implementation that does not call itself twice. The difference in code size is literally an order of magnitude.