Add ChaChaPoly AEAD-4 encryption with nonce persistence#1677
Add ChaChaPoly AEAD-4 encryption with nonce persistence#1677weebl2000 wants to merge 13 commits intomeshcore-dev:devfrom
Conversation
06320d0 to
7f3da6a
Compare
Add ChaCha20-Poly1305 AEAD decryption with 4-byte auth tag for peer messages and group channels, falling back to ECB for backward compatibility. Sending remains ECB-only in this phase. - Per-message key derivation: HMAC-SHA256(secret, nonce||dest||src) - Direction-dependent keys prevent bidirectional keystream reuse - 12-byte IV from nonce + dest_hash + src_hash - Advertise AEAD capability via feat1 bit 0 in adverts - Track peer AEAD support in ContactInfo.flags - Seed aead_nonce from HW RNG on contact creation and load
7f3da6a to
26bdb41
Compare
Send ChaChaPoly-encrypted messages to peers with CONTACT_FLAG_AEAD set, and try AEAD decode first for those peers (avoiding 1/65536 ECB false-positive). Legacy peers continue to use ECB in both directions. - Add aead_nonce parameter to createDatagram/createPathReturn (default 0 = ECB) - Add getPeerFlags/getPeerNextAeadNonce virtual methods for decode-order selection - Add ContactInfo::nextAeadNonce() helper (returns nonce++ if AEAD, 0 otherwise) - Update all BaseChatMesh send paths to pass nonce for AEAD-capable peers - Adaptive decode order: AEAD-first for known AEAD peers, ECB-first for others
eee6fd5 to
6526793
Compare
The header's route type bits (PH_ROUTE_MASK) are zero when createDatagram/createPathReturn encrypt with AEAD, but get changed to ROUTE_TYPE_FLOOD (1) or ROUTE_TYPE_DIRECT (2) by sendFlood/sendDirect afterwards. The receiver builds assoc from the received header (with route bits set), so the tag check always fails and every AEAD packet is silently dropped. Mask out route type bits in assoc data on all 5 encrypt/decrypt sites. Also track AEAD decode success to enable peer capability auto-detection.
881d18d to
7637e64
Compare
I do not understand how this prevents nonce re-use. After 65k messages from A->B the nonce looks like it will be reused. I do not understand why concatenation with src/dst would change this. The concatenation means you are partitioning the nonce value per (uni-directional) flow, in effect running different counters for A->B, B->A and C->A. Right?
What happens for devices without access to a good early boot entropy source? What if two different reboots generate the same nonce? What happens for A->B if:
What does this method improve over a plain incremental counter? Why not persist the nonce once every 100 messages, and on reboot increment by 200 (rounded down to nearest 100)? When the nonce wraps, regenerate the key. |
|
Yeah, it doesn't stop nonce re-use. I think in the end we might need more bytes for nonces. |
You do not, you can also change the key. Just negotiate a dedicated key for this. It is a lot easier to understand and make safe. It would require a round trip but then only need to be done every 65k messages; you could then also share that key for both directions (ie. A->B and B->A). Then when |
Might be a good option. But the protocol will become a bit more complex and brittle. Then again, we can always fallback to ECB if nothing was negotiated. |
jcjones
left a comment
There was a problem hiding this comment.
Not a casual review, but I like the design, and the directionality of the KDF. Good doc comments, too.
|
Thanks for all the comments so far. I will look into them. Just tested this branch with a Heltec v4 repeater and Heltec v4 companion client, and I can confirm communicating between them works using AEAD-4. It's a request for status from the repeater and the repeater response is understood correctly by the client. AEAD-4 Packet Decode VerificationWire FormatSent Packet — REQ (23 bytes)Raw:
Format confirmed AEAD-4: 17 bytes after hashes is not a multiple of 16, ruling out legacy ECB. Received Packet — RESPONSE (70 bytes)Raw:
Note: legacy ECB is structurally possible here (64 bytes is a multiple of 16), but context confirms AEAD-4. Associated DataPer the route-mask fix, assoc data masks out route type bits:
Observations
|
- Fix potential unsigned overflow in createDatagram size check by subtracting constants from MAX_PACKET_PAYLOAD instead of adding to data_len - Add upper-bound validation on src_len and assoc_len in aeadEncrypt and aeadDecrypt - Log peer name on AEAD nonce wraparound for debug builds
Prevent nonce reuse after reboots by persisting per-peer nonce counters to a dedicated /nonces (companion) or /s_nonces (server) file. On dirty reset (power-on, watchdog, brownout), nonces are bumped by NONCE_BOOT_BUMP (100) to cover any unpersisted messages. Clean wakes (deep sleep, software restart) load nonces as-is. - Add nonce persistence to BaseChatMesh (companion) and ClientACL (server) - Add wasDirtyReset() helper to ArduinoHelpers.h for platform-specific reset reason detection (ESP32/NRF52) - Add onBeforeReboot() callback to CommonCLI for pre-reboot nonce flush - Wire nonce persistence into all firmware variants: companion radio, repeater, room server, and sensor - Only clear dirty flag on successful file write
|
@weebl2000 thank you for your contributions. Have you considered jumping straight to something proven like the double ratchet instead? Used in Signal. |
I don't think double ratchet is practical, we would need to send a new key every message and rely on strict ordering of messages. With LoRa packet limits and out-of-order delivery it would be a disaster. I'm working on session key negotiation though. That will fix the nonce problem, but requires an exchange first. |
3dpgg
left a comment
There was a problem hiding this comment.
Thanks for putting together a PR to address AES-ECB! I had a cursory look, so not all files yet.
| // No session key — standard AEAD-first decode for AEAD-capable peers | ||
| len = Utils::aeadDecrypt(secret, data, macAndData, macAndDataLen, assoc, 3, dest_hash, src_hash); | ||
| if (len > 0) decoded_aead = true; | ||
| else len = Utils::MACThenDecrypt(secret, data, macAndData, macAndDataLen); |
There was a problem hiding this comment.
If a peer indicates support for AEAD, and we fail to decrypt using AEAD, do we really need to fall back to trying ECB? I'm struggling to imagine why such a peer would support AEAD but intentionally want to use ECB, given its problems. In such a case, surely the peer would instead decline to set the AEAD flag in the first place.
There was a problem hiding this comment.
I put it in for cases where peers flash old firmware and no longer support newer encryption, but haven't advertised yet. In future this fallback might be removed.
| if (len > 0) { | ||
| decoded_aead = true; | ||
| } else { | ||
| len = Utils::MACThenDecrypt(secret, data, macAndData, macAndDataLen); |
There was a problem hiding this comment.
In this scenario, we have a session key but it failed to decrypt, and the previous session key also failed to decode. And then we also failed to decode using AEAD on the long-term secret. And now here, we fall back to ECB.
But if we had a session key, presumably that means we were already talking to a peer that supported AEAD. Why would such an AEAD-capable peer be falling back to ECB? If we don't fall-back to ECB, do we lose anything?
I ask this from two perspectives: removing unnecessary computation for corrupted packets, and tightening up what inputs are permitted from AEAD_capable peers.
There was a problem hiding this comment.
Same as above, mainly useful during transition phase where clients flash older firmware.
src/Mesh.cpp
Outdated
There was a problem hiding this comment.
Can we remove the comment about "4 matches", since I think the max is actually 8. It confused me when trying to understand the context.
There was a problem hiding this comment.
It was already there in dev, but I can change it.
| if (len > 0) decoded_aead = true; | ||
| else len = Utils::MACThenDecrypt(secret, data, macAndData, macAndDataLen); | ||
| } else { | ||
| // Legacy ECB-first decode |
There was a problem hiding this comment.
It would be ideal if there was a flag or overrideable method that controls whether this peer will ever use the legacy ECB-first decoding method. That way, peers can decide to refuse ECB outright.
There was a problem hiding this comment.
I think it'll be safer to add this later. Definitely want to have this tested in the field widely before allowing it to be disabled.
| int len = Utils::MACThenDecrypt(secret, data, macAndData, pkt->payload_len - i); | ||
| int macAndDataLen = pkt->payload_len - i; | ||
|
|
||
| // Try ECB first (Phase 1), then AEAD-4 fallback. |
There was a problem hiding this comment.
For this phase 1, could it have been AEAD first and then ECB fallback? If there's a reason for ECB first, I don't immediately understand it yet. May be worth filling in a comment that the order does/doesn't matter.
There was a problem hiding this comment.
I was assuming in the beginning most clients won't support AEAD yet, so we try old encryption first. Can probably be reverse order when most clients support it.
Build firmware: Build from this branch
Testing
Summary
Adds ChaCha20-Poly1305 (AEAD-4) encryption alongside the existing AES-128-ECB + HMAC-2 scheme, plus session key negotiation for Perfect Forward Secrecy. Updated nodes send AEAD-4 to peers that advertise support and fall back to ECB for legacy peers. All nodes can decode both formats. Old nodes continue to work unchanged.
Nonces are persisted to flash so they survive reboots without risk of reuse. Session keys are negotiated via ephemeral X25519 Diffie-Hellman and persisted immediately on establishment.
Relates to #259.
What This Means in Practical Terms
The current encryption has a few weaknesses that this PR addresses:
Message tampering is too easy to attempt. The existing 2-byte authentication code means an attacker only needs about 65,000 guesses to forge a valid-looking message. At LoRa speeds that's roughly 9 hours of continuous attempts. The new 4-byte tag raises this to over 4 billion guesses — at LoRa rates, that would take over a century.
Identical messages look identical on the air. The current block cipher (ECB mode) produces the same ciphertext for the same plaintext, which can reveal patterns — for example, you could tell when someone sends the same message twice. The new scheme produces completely different ciphertext every time, even for identical messages.
Addressing fields are now protected. Currently, only the message body is authenticated. With AEAD, the payload type and addressing hashes (which identify sender and recipient) are included in the authentication check, so an attacker cannot swap or modify them without detection. Outer routing fields like TTL and hop path are intentionally left unauthenticated so repeaters can still forward packets through the mesh.
Messages get slightly smaller. ECB pads every message up to a 16-byte boundary, wasting airtime. The new scheme has no padding, so most messages shrink by a few bytes on the wire.
Compromise of a node doesn't reveal past messages. Session key negotiation establishes fresh shared secrets via ephemeral key exchange. Even if a node's long-term private key is later compromised, previously recorded traffic cannot be decrypted (Perfect Forward Secrecy).
Nothing breaks. Updated nodes send AEAD-4 to peers that advertise support, and fall back to ECB for legacy peers. Old nodes are completely unaffected — they never receive AEAD-4 messages because the sender checks their capability first.
Nodes advertise their capabilities. Updated nodes include a flag in their advertisements saying "I understand the new encryption." When two updated nodes discover each other, they automatically start using AEAD-4 for their communication.
Nonces survive reboots. Per-peer nonce counters are saved to flash periodically and before clean reboots. After a dirty reset (power loss, watchdog, brownout), nonces are bumped forward by a safety margin to guarantee no reuse.
Wire Format
Current ECB:
New AEAD-4 (same position in payload):
Average overhead: ~6 bytes (AEAD) vs ~9.5 bytes (ECB). Most messages get smaller.
Cryptographic Design
Per-message key derivation (eliminates nonce-reuse catastrophe):
The
shared_secretis either the static ECDH secret or a session key (see Session Key Negotiation below).Including
dest_hash || src_hashmakes keys direction-dependent — Alice→Bob and Bob→Alice derive different keys even with the same nonce value (for 255/256 peer pairs; the 1/256 where dest_hash == src_hash is a residual limitation of 1-byte hashes).IV construction (12 bytes, from on-wire fields):
Associated data (authenticated but not encrypted):
header || dest_hash || src_hashheader || dest_hashheader || channel_hashRoute type bits are masked out of the header in associated data (
header & ~PH_ROUTE_MASK), since routing mode changes per hop as repeaters forward packets.Nonce management: 16-bit counter per peer, persisted to flash. See "Nonce Persistence" section below.
Session Key Negotiation (Perfect Forward Secrecy)
Session keys provide Perfect Forward Secrecy by establishing fresh shared secrets via ephemeral X25519 Diffie-Hellman. Compromise of either node's long-term private key cannot recover traffic encrypted with a session key.
Protocol (2 messages + implicit confirmation)
The INIT is encrypted with AEAD-4 (static ECDH or existing session key). The ACCEPT is always encrypted with the static ECDH secret, because the initiator hasn't derived the session key yet.
Key Derivation
Uses existing
ed25519_key_exchange()(X25519 Montgomery ladder) fromlib/ed25519. No new dependencies.Who Initiates
Repeaters, room servers, and sensors only implement the responder role — they never initiate session key negotiation.
Automatic Triggers
Session key negotiation is triggered automatically based on message count. The trigger check runs inside
getEncryptionNonceFor()— the single funnel all encrypted sends pass through — so no send path can silently skip it. Negotiation is deferred to the nextloop()tick to avoid re-entrancy.3 INIT attempts per negotiation (3-minute timeout each).
Nonce Lifecycle
Encryption Key Selection
All node types use paired
getEncryptionKey()/getEncryptionNonce()functions that return the correct key and nonce based on current session state:Decode Order
Dual-Decode Window
When the responder accepts a session key INIT, it enters DUAL_DECODE state: the new session key is active for sending, but both old and new keys are accepted for decoding. Once the initiator sends a message encrypted with the new session key (message 3), the responder confirms the transition and drops the old key.
This makes ACCEPT packet loss safe — the responder stays in dual-decode, the initiator times out and retries, and no messages are lost.
Stale Session Detection
If a node sends 50 consecutive messages without receiving any session-key-encrypted reply, it falls back to static ECDH for sending (the peer may have lost the session key). At 100 unanswered sends, falls back to ECB. At 255, clears the AEAD capability flag and removes the session key entirely. The counter resets to 0 on any successful session-key-encrypted message from the peer.
Session Key Persistence
Session keys use a two-tier storage model: a small RAM pool for active sessions and a larger flash-backed store for less recently used entries.
RAM pool: 8 slots (
MAX_SESSION_KEYS_RAM), managed as an LRU cache. Each access touches a counter so the least-recently-used entry can be evicted when the pool is full. Entries inINIT_SENTstate (ephemeral keys only) are never evicted — they must complete or time out.Flash store: Up to 48 entries (
MAX_SESSION_KEYS_FLASH), persisted to/sess_keys(companion) or/s_sess_keys(server firmware).Variable-length records: Entries without a previous session key (no dual-decode) use 39 bytes (
SESSION_KEY_RECORD_MIN_SIZE); entries with a previous key use 71 bytes (SESSION_KEY_RECORD_SIZE). TheSESSION_FLAG_PREV_VALIDflag bit distinguishes the two.On-demand flash lookup: When
findSessionKey()misses the RAM pool, it reads the flash file to look for a matching entry. If found, the entry is loaded into RAM (evicting LRU if needed) and returned.Merge-save strategy: When persisting, the code reads existing flash entries, filters out any that are already in the RAM pool or have been explicitly removed, then writes the merged result (RAM entries + surviving flash-only entries). This prevents flash from resurrecting deleted entries while preserving entries that were evicted from RAM.
Removed-entry tracking: When a session key is explicitly removed (e.g., invalidation after static ECDH fallback), its prefix is recorded in a small tracking array. The merge-save step skips these prefixes so the deleted entry doesn't reappear from stale flash data. The tracking array is cleared after each successful save.
Nonce Persistence
Nonces are persisted to a dedicated file on flash (
/noncesfor companion radios,/s_noncesfor server firmware).Periodic saves: After every
NONCE_PERSIST_INTERVAL(50) messages to a given peer, the nonce file is written. A dirty flag tracks whether any nonce has advanced since the last save.Clean reboot: Software restarts and deep sleep wakes load the persisted nonces as-is. A
onBeforeReboot()callback in CommonCLI flushes any dirty nonces before the restart.Dirty reboot: Power-on, watchdog, and brownout resets are detected via
wasDirtyReset()(platform-specific:esp_reset_reason()on ESP32,RESETREASregister on NRF52). After a dirty reset, all loaded nonces are bumped forward byNONCE_BOOT_BUMP(100), which is at least 2× the persist interval, guaranteeing that even the worst-case unpersisted nonce is safely skipped. Session key nonces also receive the boot bump; if the bump causes a wrap, the nonce is forced to 65535 to trigger renegotiation.Format: Simple array of
{pub_key_prefix[6], nonce[2]}entries, matched to in-memory contacts/clients on load.Security Comparison
memcmp(timing side-channel)secure_compare(constant-time)Scope
All node types (companion radio, repeater, room server, sensor) support AEAD-4 decode, AEAD-4 send, and session key negotiation (companion initiates or responds; server firmware responds only).
Group Message Considerations
Group channels share a single key among all members. With a 2-byte nonce and multiple senders, cross-sender nonce collisions follow the birthday bound (~300 messages for 50% probability on an active channel). A collision leaks
P1 ⊕ P2for that specific message pair via crib-dragging, but:This is mainly beneficial for public/hashtag channels where the PSK is already widely known and the ECB pattern leakage and weak MAC are a greater concern than the bounded nonce collision risk.
Potential future mitigations explored and deferred:
HMAC(channel_secret, sender_pub_key)) — eliminates cross-sender collisions but requires receivers to know all senders' public keys, changing the group security model from "know the PSK = full access" to "know the PSK + sender discovery = access." Ruled out as a usability regression.Decode Order
Adaptive per-peer: for peers with
CONTACT_FLAG_AEADset, try AEAD-4 first then ECB fallback. For unknown/legacy peers, try ECB first then AEAD-4 fallback. When a session key exists, decode order is: session key → prev session key (dual-decode window) → static ECDH → ECB. This avoids the 1/65536 ECB false-positive rate on AEAD packets (nonce bytes matching truncated HMAC) for known AEAD peers, while minimizing wasted CPU for legacy peers.Capability Advertisement
feat1bit 0 (FEAT1_AEAD_SUPPORT) is set in adverts for all node types (chat, repeater, room, sensor)ContactInfo.flagsbit 1 (CONTACT_FLAG_AEAD)feat1but ignore the value (forward-compatible via existingAdvertDataParser)Files Changed
Core Library
src/MeshCore.h— AEAD constants, session key constants (SESSION_KEY_SIZE,REQ_TYPE_SESSION_KEY_INIT,RESP_TYPE_SESSION_KEY_ACCEPT,NONCE_REKEY_THRESHOLD,SESSION_KEY_*thresholds and limits), two-tier pool sizing (MAX_SESSION_KEYS_RAM=8,MAX_SESSION_KEYS_FLASH=48), variable-length record sizes (SESSION_KEY_RECORD_SIZE,SESSION_KEY_RECORD_MIN_SIZE),SESSION_FLAG_PREV_VALIDsrc/Utils.h/src/Utils.cpp—aeadEncrypt()andaeadDecrypt()using ChaChaPolysrc/Mesh.h—getPeerFlags(),getPeerNextAeadNonce(),getPeerSessionKey(),getPeerPrevSessionKey(),onSessionKeyDecryptSuccess(),getPeerEncryptionKey(),getPeerEncryptionNonce()virtuals;aead_nonceparam oncreateDatagram/createPathReturnsrc/Mesh.cpp— AEAD send path increateDatagram/createPathReturn; session key → prev session key → static ECDH → ECB adaptive decode ordersrc/helpers/ContactInfo.h—uint16_t aead_noncefield,nextAeadNonce()helpersrc/helpers/SessionKeyPool.h—SessionKeyEntrystruct andSessionKeyPoolclass (LRU-managed RAM pool withlast_usedtracking, eviction that skipsINIT_SENTentries, removed-entry tracking for merge-save safety)Companion Radio (BaseChatMesh)
src/helpers/BaseChatMesh.h/BaseChatMesh.cpp— Advertise AEAD, track peer capability, AEAD send for all peer message types, nonce persistence, session key negotiation (both initiator and responder roles), encryption key/nonce funnel (getEncryptionKeyFor/getEncryptionNonceFor), deferred rekey trigger via_pending_rekey_idxServer-Side (ClientACL + examples)
src/helpers/ClientACL.h/ClientACL.cpp— Server-side AEAD nonce tracking and persistence, session key responder (handleSessionKeyInit), paired encryption key/nonce selection (getEncryptionKey/getEncryptionNonce), flash-backed session key wrappers with merge-save, peer-index forwarding helperssrc/helpers/CommonCLI.h/CommonCLI.cpp— Advertise AEAD for repeaters/rooms/sensors;onBeforeReboot()callback for nonce/session key flushexamples/simple_repeater/MyMesh.h/MyMesh.cpp— AEAD + session key support, nonce persistence, session key INIT handling inonPeerDataRecvexamples/simple_room_server/MyMesh.h/MyMesh.cpp— Sameexamples/simple_sensor/SensorMesh.h/SensorMesh.cpp— SamePlatform Support
src/helpers/ArduinoHelpers.h—wasDirtyReset()helper (ESP32/NRF52 reset reason detection)examples/companion_radio/DataStore.h/DataStore.cpp— Nonce and session key file I/O, variable-length session key records, merge-save with flash-backed lookup (loadSessionKeyByPrefix)examples/companion_radio/MyMesh.h/MyMesh.cpp— Wire up nonce/session key persistence and reboot callback, flash-backed session key overrides (loadSessionKeyRecordFromFlash,mergeAndSaveSessionKeys)Build Verification
Heltec_v3_companion_radio_ble): builds successfullyHeltec_v3_repeater): builds successfullyHeltec_v3_room_server): builds successfullyXiao_nrf52_companion_radio_ble): builds successfullyFuture Work
rekey <peer>CLI command for manual session key renegotiationBuild firmware: Build from this branch