Skip to content

usb/cdc: fix RP2 USB CDC TX race with cores scheduler#5391

Open
rdon-key wants to merge 3 commits into
tinygo-org:devfrom
rdon-key:fix-rp2-usb-cdc-tx-race
Open

usb/cdc: fix RP2 USB CDC TX race with cores scheduler#5391
rdon-key wants to merge 3 commits into
tinygo-org:devfrom
rdon-key:fix-rp2-usb-cdc-tx-race

Conversation

@rdon-key
Copy link
Copy Markdown
Contributor

Fixes #5377.

This PR fixes a USB CDC TX race on RP2 targets when using -scheduler=cores.

The issue was reproduced on Raspberry Pi Pico / RP2040 with TinyGo 0.41.1.
RP2350 may be affected as well because it shares the RP2 USB backend and the
same USB CDC TX state machine.

Cause

USB CDC TX is driven from two places:

  • Write() via kickTx()
  • the USB TX completion handler via txhandler()

The previous code used inflight both as:

  • the number of bytes currently submitted to the USB IN endpoint
  • an implicit guard for whether the TX pump is active

That is not sufficient on multicore targets.

With -scheduler=cores, Write() may run on one core while the USB TX
completion handler runs on another core. In that case, kickTx() can observe the
TX path as idle while txhandler() is still processing a completed packet and
chaining the next one.

That can allow concurrent or re-entrant calls into sendFromRing() /
machine.SendUSBInPacket() for the same USB IN endpoint, which can corrupt USB
CDC output.

Fix

This PR adds a separate atomic txActive flag.

txActive represents ownership of the USB CDC TX pump. inflight remains only
the number of bytes currently submitted to the endpoint.

kickTx() now starts the TX pump only if it can acquire txActive with CAS.
The TX completion handler keeps TX pump ownership while chaining packets, and
ownership is released only when the TX ring is empty.

A final ring re-check is used when releasing ownership to avoid a missed wakeup
if Write() appends data while txActive is still set.

Manual test

Test program:

package main

import "time"

func main() {
	time.Sleep(2 * time.Second)

	println("start usb cdc tx stress")

	for i := 0; i < 1000; i++ {
		println("usb cdc tx test:", i, "abcdefghijklmnopqrstuvwxyz0123456789")
		time.Sleep(1 * time.Millisecond)
	}

	println("test finished")
}

Commands:

~/work/tinygo/build/tinygo flash -target=pico -scheduler=cores -monitor usbcdc_tx_stress.go
~/work/tinygo/build/tinygo flash -target=pico -scheduler=tasks -monitor usbcdc_tx_stress.go

Results:

Before this fix:
- TinyGo 0.41.1 + -scheduler=cores: USB CDC output was corrupted
- TinyGo 0.41.1 + -scheduler=tasks: OK

After this fix:
- local dev build + -scheduler=cores: OK
- local dev build + -scheduler=tasks: OK

The -scheduler=cores test was repeated multiple times without observing
dropped or corrupted USB CDC output.

@rdon-key rdon-key force-pushed the fix-rp2-usb-cdc-tx-race branch 2 times, most recently from ce5c04a to c059707 Compare May 11, 2026 14:29
Comment thread src/machine/usb/cdc/usbcdc.go Outdated
@deadprogram
Copy link
Copy Markdown
Member

@rdon-key thank you for looking into this. Your solution makes sense to me, I just suggested a small correction in form.

@rdon-key
Copy link
Copy Markdown
Contributor Author

@deadprogram
Thank you for the review! I applied the suggested change.

@deadprogram
Copy link
Copy Markdown
Member

Anyone else have any feedback or comments before merge?

@deadprogram
Copy link
Copy Markdown
Member

@soypat ?

@soypat soypat added the rp2 RP2350/RP2040 label May 28, 2026
Comment thread src/machine/usb/cdc/usbcdc.go
@rdon-key rdon-key force-pushed the fix-rp2-usb-cdc-tx-race branch from fb52e10 to c690c2c Compare May 29, 2026 04:34
}
usbcdc.tx.Discard(inflight)
usbcdc.inflight.Store(0)
usbcdc.sendFromRing()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd to me this is not guarded with a txActive compare-swap. I think my issue with this PR as it stands is it is not clear to me what txActive is guarding. From the looks of it sendFromRing can be called concurrently from kickTx and txhandler and we've added a bunch of logic in sendFromRing to deal with this. I feel like we can maybe keep sendFromRing as a simple function that does what it says and guard it from executing concurrently as I suspect that concurrent execution is not necessary for this to work correctly.

maybe we can even delete kickTx and simply add the guard at the start of sendFromRing and do a simple Store(0) immediately when sendFromRing ends it's process?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds very reasonable, and less complex. What do you think @rdon-key?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txActive is not only guarding the lexical execution of sendFromRing. It represents ownership of the asynchronous TX pump.

This is the usual queue + active-flag missed-wakeup pattern: while the active flag is set, producers may enqueue more work but do not start another worker. Therefore, when the current owner is about to go idle, it must clear the active flag and then re-check the queue before returning.

In this case, sendFromRing submits one USB IN packet and then returns, but the TX pump must still be considered active while that packet is in flight. The ownership is continued by txhandler when the TX completion arrives.

That is why txhandler does not acquire txActive with CAS before calling sendFromRing: it is continuing the ownership that was acquired by kickTx.

If we put the CAS at the start of sendFromRing, the completion path would normally see txActive already set and fail to continue the pump. If we clear txActive immediately when sendFromRing returns after submitting a packet, Write could acquire it and call SendUSBInPacket again while the previous packet is still in flight, which is the race this PR is trying to avoid.

The intended ownership model:

  • kickTx acquires txActive when the TX pump is idle.
  • sendFromRing sends one packet while ownership is held.
  • txhandler continues the same ownership across TX completions.
  • ownership is released only when the ring appears empty, with a final re-check to avoid a missed wakeup.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a really hard time working through the logic. I've asked google AI to create an abstraction taking into account your approach. I do like the idea of defining a type that encapsulates the task pump functionality. Here's what the AI came up with. Let me know if any of these abstractions work. Note there are two abstractions provided, one uses a dynamic call, the other embeds the task pump logic more cleanly (second one). Let me know if this sounds like a good path forward.

https://share.google/aimode/CPm4rcMLZXj16sTsA

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soypat — I did look at both sketches.

The main issue is that they don't capture the TX pump's IRQ boundary. The TX pump isn't a run-to-completion loop: it sends one packet, then has to suspend while keeping ownership, and resume from the TX-completion IRQ (txhandler) without re-acquiring.

As written, both sketches seem to collapse that boundary: they put Kick()/CAS in txhandler, so the pump can stall after the first packet while txActive is still held. To make the abstraction actually span the IRQ boundary, it would need a separate continuation entry that skips the CAS, an explicit yield / in-flight state, and the release-with-recheck path.

At that point, I don't think the abstraction is simpler than what's here now — it just spreads the same state machine across a generic type and the driver.

So for this PR I'd like to keep the scope minimal. It's a regression fix for #5377, and I'd rather land the smallest change that closes the race. If we want to revisit a TaskPump-style abstraction, I think that should be a separate follow-up issue with its own review and on-hardware testing.

Copy link
Copy Markdown
Contributor

@soypat soypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was already a bit unhappy we can't do something more readable and now I found some confusing comments

Comment on lines +136 to 138
// The caller must own txActive. Ownership starts in kickTx and is kept across
// TX completion handling until the TX ring is empty.
func (usbcdc *USBCDC) sendFromRing() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems false. txhandler is calling sendFromRing and is not owning txactive.

Comment on lines +35 to +37
// txActive serializes the USB CDC TX pump between Write and the TX
// completion handler. This matters on multicore targets where they can run
// concurrently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 25 words here and it can more or less be reduced to "txActive serializes kickTx/txhandler write pump". All good on purpose but lacking on what the data structure is and how it is used. there is a loose comment below saying this lock is held by caller but I've pointed out that is not true. If we are not going to use a type to represent this data structure, for whatever reason, you need to document the minutiae of how the data structure is used. It is far from evident from reading the code.

Comment on lines +139 to +140
// This is the main TX pump loop. While txActive is held, each iteration
// peeks the TX ring and sends one packet if data is available.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also misleading. "While txActive is held xxx". There is no branching on a txActive state before sending data- not to mention it need not be held when calling from txhandler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rp2 RP2350/RP2040

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RP2040: USB CDC output drops characters with -scheduler=cores on TinyGo 0.41.1

3 participants