Last time I mentioned libunwind, but now I’m going to go over the implementation
of libunwind in GCC. As mentioned before, libunwind used to be a library
developed by HP, but it got vendored into GCC as a part of libgcc
. libgcc
itself is a small library containing, besides libunwind, various intrinsics for
floating point and integer math.
libunwind
One might reasonably ask the question of how one can unwind a stack while using functions that modify the stack? Moreover, how can libunwind be cross-platform when it clearly needs to reference architecture-specific registers? The answer is that register state is stored in a struct entirely separately from the real registers. “Unwinding” in libunwind is simply updating the register state in that struct. This struct is known as the unwind context, and is the central data structure behind libunwind. When unwinding is finished, the context is “installed” into the cpu via architecture-specific instructions that read the struct into the registers.
API
As mentioned before, libunwind exposes:
_Unwind_RaiseException
_Unwind_Resume
_Unwind_DeleteException
However, it also exposes the functions:
_Unwind_GetGR
_Unwind_SetGR
_Unwind_GetIP
_Unwind_SetIP
_Unwind_GetRegionStart
_Unwind_GetLanguageSpecificData
_Unwind_ForcedUnwind
The GR functions are for referencing general registers, while the IP functions
reference the instruction pointer. Both are used when jumping to the launch pad,
as well as general unwinding. GetRegionStart
is used to find the beginning of
the function, which is necessary when parsing the LSDA.
Control Flow
When _Unwind_RaiseException
is called, it first creates the context and starts
“Phase 1” as described in the Itanium ABI. It iterates through the call stacks
using uw_frame_state_for
to load the function frame until a function with a
personality routine is found. At that point, “Phase 2” is executed. Phase 2
checks the stop handler, defined if _Unwind_ForcedUnwind
is called, then
checks if there is a personality. If the personality exists and wishes to stop,
then the current context is installed. Otherwise, unwinding resumes.
_Unwind_Resume
is similar to _Unwind_RaiseException
, but doesn’t create a
new exception object.
uw_frame_state_for
is responsible for finding the CIE and FDE debug
information mentioned last time and actually updating the information in the
context. Like a responsible employee, it naturally delegates finding the FDE to
_Unwind_Find_FDE
, then uses execute_cfa_program
to actually run the DWARF
instructions.
_Unwind_Find_FDE
Naively, you might think that there’s no better way to find the FDE than a plain linear search. After all, the descriptors are variable length and not even sorted. In memory-constrained environments, this is the case. However, GCC was pleasant enough to find a way to construct a btree out of the FDE’s to make lookup logarithmic. Thanks to the contributions of some database architects and a GCC contributor, this btree is also under a rw lock, meaning it scales across threads. The reason why this is a data structure constructed at program start1 is because of the presence of shared libraries. Exception information for shared libraries must be added and removed from the btree when they are loaded and unloaded2, meaning that a rw-lock is necessary.
execute_cfa_program
execute_cfa_program is a very long program dedicated to parsing DWARF instructions. Fortunately there is only a small subset of instructions used in FDEs, so it’s not that bad. DWARF instructions used in unwinding are prefixed with DW_CFA for DWARF Canonical Frame Address instructions. The instructions are as such3:
- DW_CFA_advance_loc: increments the instruction pointer
- DW_CFA_advance_loc1: similar to above, but uses a 1 byte argument
- DW_CFA_advance_loc2: same as above, but 2 bytes
- DW_CFA_advance_loc4: 4 bytes
- DW_CFA_offset: sets a register to an offset of the stack (represents pushing a register)
- DW_CFA_offset_extended: similar to above, takes uLEB values instead
- DW_CFA_restore: restore a register
- DW_CFA_restore_extended: similar to above, takes uLEB values
- DW_CFA_set_loc: set the instruction pointer to a value
- DW_CFA_undefined: the register value is not defined by the caller
- DW_CFA_same_value: the register value is the same as before
- DW_CFA_register: the register is moved to another register
- DW_CFA_remember_state / DW_CFA_restore_state: push and pop the register state off a stack. This is apparently used when parts of the epilogue is in the body of a function.
- DW_CFA_def_cfa: The frame size is an offset of a register and a constant
- DW_CFA_def_cfa_register: This changes the frame size register
- DW_CFA_def_cfa_offset: This changes the offset of the frame size
- DW_CFA_nop: does nothing
These instructions are executed until the current instruction pointer is reached. After that, they are used to restore the registers as described by the instructions.
Criticism
My major criticisms are with the usage of DWARF and the two-phase setup. DWARF is non-trivial to decode, and definitely does not belong in any performance-critical code. As a reminder, this has to happen every single function called until a catch block is found. This is worsened by the fact that libunwind utilizes a two-phase system for no discernable reason. Libunwind unwinds the stack once, calls the personality routine to ask if they want to stop, then unwinds again, then asks the personality routine again if they want to do anything.
There is no reason why the personality routine couldn’t be called just once. In fact, that is literally what I did. In AVR exceptions, I have replaced the GCC personality routine and libunwind with my own implementation that does not call itself twice. The difference in code size is literally an order of magnitude.
See __do_global_ctors1 in crtstuff.c, ↩︎