Module: firmware/stm32/Core/Src/fdir.c
Header: firmware/stm32/Core/Inc/fdir.h
Tests: firmware/tests/test_fdir.c — 9/9 green.
Phase: TRL-5 hardening, Phase 3.
FDIR is the satellite’s autonomous supervision layer. It sits above the hardware IWDG and the subsystem-level error handlers, and answers the only question operations actually cares about:
when something goes wrong, what does the satellite do next?
The module is deliberately advisory: it tracks faults, classifies
severity, and recommends a recovery action. The actual mode change
(bus reset, subsystem disable, safe-mode entry, reboot) is enacted
by the caller — the subsystem supervisor task, the safe-mode
manager, or error_handler.c. Keeping FDIR advisory means:
LOG_ONLY
└─► RETRY (transient — re-try the operation)
└─► RESET_BUS (peripheral / bus reset sequence)
└─► DISABLE_SUBSYS (isolate the failing subsystem;
mission continues in degraded mode)
└─► SAFE_MODE (beacon-only, min power, stable
attitude, wait for ground)
└─► REBOOT (NVIC_SystemReset after persisting
the reason in non-volatile fault log)
Each fault has a primary action and an escalation action;
escalation kicks in once the fault has fired threshold times
inside FDIR_RECENT_WINDOW_MS (default 60 s).
Derived from the static g_table[] in fdir.c. Changing an entry
here requires changing the code — there is no runtime config file.
| id | name | primary | escalation | threshold |
|---|---|---|---|---|
| 0 | watchdog_task_miss |
RESET_BUS | REBOOT | 3 |
| 1 | i2c_bus_stuck |
RESET_BUS | DISABLE_SUBSYS | 5 |
| 2 | spi_timeout |
RETRY | DISABLE_SUBSYS | 5 |
| 3 | sensor_out_of_range |
LOG_ONLY | DISABLE_SUBSYS | 10 |
| 4 | battery_undervolt |
SAFE_MODE | REBOOT | 2 |
| 5 | over_temperature |
DISABLE_SUBSYS | SAFE_MODE | 3 |
| 6 | under_temperature |
LOG_ONLY | SAFE_MODE | 3 |
| 7 | stack_overflow |
REBOOT | REBOOT | 1 |
| 8 | heap_exhaust |
SAFE_MODE | REBOOT | 2 |
| 9 | pll_unlock |
REBOOT | REBOOT | 1 |
| 10 | comm_loss |
SAFE_MODE | REBOOT | 2 |
| 11 | keystore_empty |
LOG_ONLY | SAFE_MODE | 1 |
if (HAL_I2C_Mem_Read(...) != HAL_OK) {
FDIR_Report(FAULT_I2C_BUS_STUCK);
switch (FDIR_GetRecommendedAction(FAULT_I2C_BUS_STUCK)) {
case RECOVERY_RESET_BUS:
i2c_bus_reset(); /* 9 clocks + STOP */
break;
case RECOVERY_DISABLE_SUBSYS:
sensors_disable_group(SENSOR_GROUP_I2C);
break;
default:
break;
}
return -1;
}
/* On success, tell FDIR the fault is behind us so a single later
* glitch doesn't tip the window into escalation. */
FDIR_ClearRecent(FAULT_I2C_BUS_STUCK);
if (FDIR_GetRecommendedAction(FAULT_BATTERY_UNDERVOLT)
>= RECOVERY_SAFE_MODE) {
mode_manager_enter_safe(SAFE_MODE_REASON_POWER);
}
FDIR_Stats_t st = FDIR_GetStats();
beacon_pack_u32(buf, 40, st.total_faults);
beacon_pack_u32(buf, 44, st.escalations);
beacon_pack_u32(buf, 48, st.safe_mode_entries);
beacon_pack_u32(buf, 52, st.reboots_scheduled);
firmware/tests/test_fdir.c covers:
ClearRecent drops recent_count to 0 but preserves total_count.ResetAll zeroes both per-fault state and aggregate stats.safe_mode_entries, reboots_scheduled)
track eagerly so downlink telemetry reflects current policy.Run locally:
make build && ctest --test-dir firmware/build -R fdir --output-on-failure
Expected output:
1/1 Test #18: fdir ............................ Passed 0.01 sec
100% tests passed, 0 tests failed out of 1
.bss and reset to zero on warm boot.
A follow-up (Phase 4 non-volatile fault log) will add an
.noinit shadow so a REBOOT recommendation survives with the
triggering reason attached.__fdir_hal_tick) so host tests can inject a
deterministic clock and the firmware links on a bare-metal
smoke build with no HAL dependency.