Ensuring Safe OTA Updates with Best Practices


Ensuring Data Integrity and Handling Failures in OTA Firmware Updates

In previous episodes, we explored OTA architecture, partitioning, and security. But even with perfect encryption and boot validation, there’s another crucial layer: ensuring data integrity and safely handling failures — especially in constrained BLE environments.

If a firmware update is interrupted or corrupted, and your system doesn’t catch it in time, you risk bricking the device.

This episode will break down:

  • How to detect data corruption
  • How to fail gracefully and recover
  • How real-world devices like Fitbit and STM32WB do it
  • Techniques like CRC, SHA, status flags, watchdog resets, and rollback

What Can Go Wrong in OTA?

RiskExample Case
Packet loss or corruptionBLE dropout mid-transfer
Power loss during write/swapBattery dies mid-flash
Partial updateTransfer aborted but metadata was updated
Flash write errorsMisaligned writes or ECC failure
Wrong or tampered firmwareFirmware modified post-download

Verifying Data Integrity

Data integrity checks ensure what was received is what was expected — before flashing or booting.

1. CRC (Cyclic Redundancy Check)

  • Lightweight and fast
  • Usually CRC16 or CRC32
  • Verified during:
    • End of OTA transfer
    • Bootloader check
bool crc_check_passed = calculate_crc32(image) == expected_crc32;

Example:

  • Fitbit and Nordic DFU protocols attach CRC32 to each chunk and to the entire image.

2. SHA-256 Hash Check

  • Stronger, cryptographic hash
  • Slower, but more robust
  • Used for:
    • Final image validation before reboot
    • Signature verification
sha256(image, length, hash_out);
if (memcmp(hash_out, expected_hash, 32) != 0) {
    return IMAGE_CORRUPTED;
}

Example:

  • STM32WB + SBSFU, ESP32 Secure Boot, and MCUBoot all use SHA-256 for validating update images.

3. Signature Check = Integrity + Authenticity

If your update is signed (ECC/RSA), the signature validates both the integrity and authenticity of the image.

  • If SHA-256 hash fails, the signature check fails.
  • If someone tampers with the firmware post-signing, bootloader blocks execution.

Handling OTA Failures

OTA should never leave your device in a broken state. That’s where rollback and retry mechanisms come in.

1. Watchdog Timers for First Boot

  • Start a watchdog timer after OTA reboot
  • If the app crashes or fails to clear the watchdog, bootloader flags the update as failed
start_watchdog(5_seconds);
app_code();  // if this hangs, watchdog triggers reboot

Example:

  • MCUBoot uses a “pending” state. Only after the first boot succeeds does the firmware become “permanent”.
  • Fitbit uses a boot signal sent from app to bootloader via shared flash flag.

2. Boot Status Flags in Flash

  • Use a reserved flash page or option byte to store OTA status:
    • OTA_PENDING
    • OTA_SUCCESS
    • OTA_FAILED

Bootloader Logic:

if (ota_status == OTA_PENDING) {
    if (firmware_valid()) {
        set_ota_status(OTA_SUCCESS);
    } else {
        rollback_to_previous_image();
    }
}


3. Rollback to Previous Image

Devices with dual-slot partitioning (Episode 3) can revert if the new image fails validation or boot.

Steps:

  1. Keep last known good firmware in App Slot A
  2. OTA installs new image to Slot B
  3. If Slot B fails, bootloader rolls back to Slot A

Example:

  • Fitbit, Oura Ring, and Amazon Echo Buds all support rollback using similar logic.

4. Resume Interrupted Updates

BLE transfers are fragile. Your OTA logic should:

  • Allow resume from last good chunk
  • Validate each chunk’s CRC/hash
  • Avoid rewriting already validated blocks

Nordic DFU Example:

// If transfer fails after 62%, reconnect resumes from chunk 63
device responds with last received offset → app continues


Practical Design Tips

TipWhy It Helps
Always verify hash or CRC before bootPrevents corrupted firmware from being executed
Use boot status flagsTracks update status across reboots
Don’t erase old firmware until successEnables rollback
Use watchdog timer after OTACatches faulty first boots
Use power-fail-safe flash writing logicAvoids half-burned sectors
Keep metadata separate from OTA partitionsPrevents accidental overwrite

Real-World Snapshot: Fitbit OTA

  • OTA update via BLE
  • Each packet validated with CRC
  • Final image validated with SHA-256
  • Bootloader only swaps if update status is OK
  • Rollback on boot error or hash mismatch
  • Watchdog used for first-boot crash detection

Failure Handling in STM32WB (with SBSFU)

  • SBSFU marks OTA status in a reserved flash area
  • Verifies firmware header (hash + signature)
  • Boots new image only if verified
  • Resets to old image if first boot fails or watchdog triggers
  • Optional: tamper-detection logic or hardware reset control

Conclusion

A secure OTA update is not just about encryption — it’s about ensuring the firmware was delivered, validated, and installed safely. If something fails, your device must recover gracefully, not silently fail or brick.

Handling failures is a mark of a mature, production-grade OTA system.