Skip to content

Conversation

@purdeaandrei
Copy link
Contributor

When processing OUT control transfers, the EP0 buffer is armed when wriging EP0BCL, while SDPAUTO is high. At this stage it's not necessary to set HSNAK, because that bit has an effect on the status stage, not on the data stage.

Beyond it being not necessary to write HSNAK early, it introduces a race condition, because the host sees the status stage complete early, before we have fully processed the content of EP0BUF, and as such it thinks it can send more control transfers. If the host actually sends more control transfers, then EP0BUF might be overwritten before we have completed processing EP0BUF, and this can cause data corruption.

The clean way to handle this, is to force the host to wait in the status stage until we have fully processed the EP0BUF, by NAK-ink the status stage. This does not mean that the packets in the Data stage will be NAK'd because, that is controlled by whether the buffer is armed or not.

Firmware-wise the new recommended way to the OUT control transfers is:

for (each_expected_packet) {
    SETUP_EP0_OUT_BUF();
    handle_packet(EP0BUF, EP0BCL);
      // EP0BCL will have the size of the received packet
}
ACK_EP0();

That is: only call ACK_EP0(), which clears HSNAK by writing 1 to it, after we know it's safe to overwrite EP0BUF.

Note: a simple way to reproduce the issue when not following this procedure, is to run:

while true; do lsusb -v -d $(DEVICE_VID):$(DEVICE_PID) > /dev/null; done

In a terminal. (replacing DEVICE_VID/DEVICE_PID with correct values) While this loop is running, we expected most applications doing control out transfers with the old method to become unusable.

Technically speaking a single run of lsusb could even cause trouble if it happes at the wrong time, and I think this could be triggered in other situations as well, this is just the easiest way to reproduce the problem.

@purdeaandrei
Copy link
Contributor Author

MORE PRs to follow to fix this for glasgow, and some examples in libfx2

@whitequark
Copy link
Owner

Thanks so much for handling this, I'm really happy that you are looking into it! Right now I'm feeling too sick to review the code but I'll do it as soon as I can. I've known about what I think is this exact issue for a while but never found the time to handle it.

purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Sep 26, 2024
Before this change, the EP0BUF vuffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Sep 26, 2024
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
@purdeaandrei
Copy link
Contributor Author

Thanks so much for handling this, I'm really happy that you are looking into it! Right now I'm feeling too sick to review the code but I'll do it as soon as I can. I've known about what I think is this exact issue for a while but never found the time to handle it.

I hope you will be better soon!

@whitequark
Copy link
Owner

I'm disabled; it does not really get better. But I may be able to work out some time for my OSS projects, since I want to work on them still.

purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Sep 27, 2024
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
@whitequark
Copy link
Owner

Thank you, the analysis and the fix look all correct to me. Could you please add a pair of macros SETUP_EP0_IN_BUF and SETUP_EP0_OUT_BUF while deprecating SETUP_EP0_BUF please?

For deprecation, I propose making it rely on a data symbol called something like SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead, provided SDCC doesn't choke on a symbol this long. If you want to keep using old code, you can just define the symbol to resolve to something that doesn't matter and doesn't bloat codegen, like a single byte in pdata, or a bit (you usually have some of those spare).

@whitequark
Copy link
Owner

Also, I would strongly prefer to upgrade all of the examples as well, since in practice half of the people who use the library at all will just copy&paste example code.

@purdeaandrei purdeaandrei force-pushed the f_add_SETUP_EP0_OUT_BUF branch from 696e8c4 to 7b0ed57 Compare September 15, 2025 15:20
@purdeaandrei
Copy link
Contributor Author

Thank you, the analysis and the fix look all correct to me. Could you please add a pair of macros SETUP_EP0_IN_BUF and SETUP_EP0_OUT_BUF while deprecating SETUP_EP0_BUF please?

DONE

For deprecation, I propose making it rely on a data symbol called something like SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead, provided SDCC doesn't choke on a symbol this long. If you want to keep using old code, you can just define the symbol to resolve to something that doesn't matter and doesn't bloat codegen, like a single byte in pdata, or a bit (you usually have some of those spare).

Could you clarify why we'd need an actual symbol here, instead of just a macro?
If I were to do something like:

#define SETUP_EP0_BUF(length) \
  do { \
    if (SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead) { \
      SUDPTRCTL = _SDPAUTO; \
      EP0BCH = 0; \
      EP0BCL = length; \
      EP0CS = _HSNAK; \
    }\
  } while(0)

Then the error message would be main.c:202: error 20: Undefined identifier 'SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead'
And you could make it go away with CFLAGS=-DSETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead=1

And code size would remain unchanged.

However if I were to do something like:

extern uint8_t SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead;

#define SETUP_EP0_BUF(length) \
  do { \
      SUDPTRCTL = _SDPAUTO; \
      EP0BCH = 0; \
      EP0BCL = length + SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead; \
      EP0CS = _HSNAK; \
  } while(0)

Then the error message would be:
?ASlink-Warning-Undefined Global '_SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead' referenced by module 'main'

Then you could make it go away by adding something like this to one of your .c files:

uint8_t SETUP_EP0_BUF_is_deprecated_use_SETUP_EP0_IN_BUF_or_SETUP_EP0_OUT_BUF_instead;

But this would result in added code size, not only to store the symbol, but to also access it every time SETUP_EP0_BUF is called

Wouldn't it be better to just use a missing identifier?

Also, I would strongly prefer to upgrade all of the examples as well, since in practice half of the people who use the library at all will just copy&paste example code.

Of coure. I presume then, that if libfx2 will error out by default with the deprecated function, you'd prefer the fixes to the examples as part of the same PR, right?

When processing OUT control transfers, the EP0 buffer is armed when
wriging EP0BCL, while SDPAUTO is high. At this stage it's not
necessary to set HSNAK, because that bit has an effect on the status
stage, not on the data stage.

Beyond it being not necessary to write HSNAK early, it introduces a
race condition, because the host sees the status stage complete early,
before we have fully processed the content of EP0BUF, and as such it
thinks it can send more control transfers. If the host actually sends
more control transfers, then EP0BUF might be overwritten before we have
completed processing EP0BUF, and this can cause data corruption.

The clean way to handle this, is to force the host to wait in the status
stage until we have fully processed the EP0BUF, by NAK-ink the status
stage. This does not mean that the packets in the Data stage will be
NAK'd because, that is controlled by whether the buffer is armed or not.

This commit deprecates `SETUP_EP0_BUF(length)`, and introduces:
- `SETUP_EP0_IN_BUF(length)` as a direct replacement for IN transfers
- `SETUP_EP0_OUT_BUF()` as a new macro to perform OUT transfers

The new recommended way to do OUT control transfers is:

```c
for (each_expected_packet) {
    SETUP_EP0_OUT_BUF();
    handle_packet(EP0BUF, EP0BCL);
      // EP0BCL will have the size of the received packet
}
ACK_EP0();
```

That is: only call `ACK_EP0()`, which clears HSNAK by writing 1 to it,
after we know it's safe to overwrite EP0BUF.

Note: a simple way to reproduce the issue when not following this
procedure, is to run:

```bash
while true; do lsusb -v -d $(DEVICE_VID):$(DEVICE_PID) > /dev/null; done
```

In a terminal. (replacing DEVICE_VID/DEVICE_PID with correct values)
While this loop is running, we expected most applications doing control
out transfers with the old method to become unusable.

Technically speaking a single run of `lsusb` could even cause trouble if
it happes at the wrong time, and I think this could be triggered in
other situations as well, this is just the easiest way to reproduce the
problem.
This is now used instead of SETUP_EP0_BUF whenever
setting up an IN transfer.

This commit doesn't introduce any kind of functional change.
This commit doesn't introduce any kind of functional change.

The cdc-acm example wasn't vulnerable to the race condition
caused by using `SETUP_EP0_BUF(0)`. I have tested that both
the original and new versions of cdc-acm work correctly.
I have tested the following boot-cypress writing eeprom,
while executing the following in the background:
`while true; do lsusb -v -d 04b4:8613 > /dev/null; done`

With the old version of boot-cypress this results in
corrupted data.

With the new version of boot-cypress this works correctly.
I have tested boot-dfu downloading firmware
with dfu-util, while executing the following
in the background, as root:
`while true; do lsusb -v -d 04b4:8613 > /dev/null; done`

With old version of boot-dfu this results in corrupted
data.

With the new version of boot-dfu this works correctly.

I had to allocate an additonal scratch2 buffer.
We cannot use the normal scratch buffer, because it's
used when getting descriptors, and the OS may send
GET_DESCRIPTOR requests at any time.

I have also tested the new version of boot-dfu-spiflash,
but only flashing the eeprom.

I have also tested the new version of boot-uf2-dfu
in dfu-mode only.
@purdeaandrei purdeaandrei force-pushed the f_add_SETUP_EP0_OUT_BUF branch from 7b0ed57 to 7a68b79 Compare October 6, 2025 02:18
@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Oct 6, 2025

Hello,

  • I have implemented fixes for all the examples.
    • For the DFU examples I had the allocate an extra scratch buffer. This is because getting descriptors also uses the scratch buffer. And the OS can send get descriptor requests while we have not-yet written download data in the scratch buffer.
      • If this is not desirable, then alternatively we could get rid of this buffer, by immediately writing the chunk, as soon as it arrives with the DFU_DNLOAD request, protect the endpoint buffer, by not issuing an ACK until the full amount has been written, and pretend that we're only writing it later during the dfuDNBUSY.
  • I have enforced deprecation of SETUP_EP0_BUF, using an identifier, please see my previous comment, and let me know what you think.
  • I have successfully tested all the modifications I made

This PR is ready for review

Copy link
Owner

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, excellent work!

@whitequark
Copy link
Owner

For the DFU examples I had the allocate an extra scratch buffer. This is because getting descriptors also uses the scratch buffer. And the OS can send get descriptor requests while we have not-yet written download data in the scratch buffer.

It's not a big deal IMO; I value correctness much more than space saving. If this becomes an issue in the future we can always find a solution later; it's perfectly fine for the examples.

Could you clarify why we'd need an actual symbol here, instead of just a macro?

I think your proposal makes more sense than mine; let's do that!

purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Oct 6, 2025
… TODO MEASURE AGAIN).

Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
@whitequark
Copy link
Owner

(I haven't merged the PR until now because GitHub doesn't automatically send an e-mail if you update a past commit via force-push.)

@whitequark whitequark merged commit 00f7d82 into whitequark:main Oct 7, 2025
1 check passed
purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Oct 8, 2025
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
purdeaandrei added a commit to purdeaandrei/libfx2 that referenced this pull request Oct 8, 2025
Especially to include fixes from whitequark#18 and whitequark#25
This time boot-cypress is built with 4.5.0 #15242.
For future builds please run from the root of the repository:
`./software/deploy-bootloader.sh`

The deploy script was adapted from the one used in glasgow firmware.
github-merge-queue bot pushed a commit to GlasgowEmbedded/glasgow that referenced this pull request Oct 8, 2025
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
whitequark pushed a commit that referenced this pull request Oct 8, 2025
Especially to include fixes from #18 and #25
This time boot-cypress is built with 4.5.0 #15242.
For future builds please run from the root of the repository:
`./software/deploy-bootloader.sh`

The deploy script was adapted from the one used in glasgow firmware.
purdeaandrei added a commit to purdeaandrei/glasgow that referenced this pull request Oct 8, 2025
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
github-merge-queue bot pushed a commit to GlasgowEmbedded/glasgow that referenced this pull request Oct 8, 2025
Before this change, the EP0BUF buffer used by control out transfers
could be overwritten by new control transfers before the firmware is
finished processing the previous control transfer.

The easiest way to illustrate this problem would be to run in a
different terminal the following:

```bash
while true; do lsusb -v -d 20b7:9db1 > /dev/null; done
```

While this is running, glasglow is completely unusable.
Presumably even a single lsusb run could cause corruption, if it
happens to be issued at the wrong time.

With this change glasgow is now usable, even if the above loop is
running.

Please see whitequark/libfx2#18 for more
details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants