Discussion:
[Libusbx-devel] [libusbx] break from never ending loop. fix #76. (#167)
miguelfreitas
2014-01-09 20:29:07 UTC
Permalink
exit from sync_transfer_wait_for_completion when libusb_handle_events_completed failed
and libusb_cancel_transfer returns LIBUSB_ERROR_NOT_FOUND. this error is returned when
urbs is not found in kernel (it was already dequeued from async_completed list),
therefore there is nothing to cancel. if we don't break here libusbx will get stuck in
this loop forever. fix #76.
You can merge this Pull Request by running:

git pull https://github.com/miguelfreitas/libusbx issue_76

Or you can view, comment on it, or merge it online at:

https://github.com/libusbx/libusbx/pull/167

-- Commit Summary --

* exit from sync_transfer_wait_for_completion when libusb_handle_events_completed failed

-- File Changes --

M libusb/sync.c (8)

-- Patch Links --

https://github.com/libusbx/libusbx/pull/167.patch
https://github.com/libusbx/libusbx/pull/167.diff

---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167
Pete Batard
2014-01-09 21:57:29 UTC
Permalink
Thanks for the patch. We'll see what we can do to integrate it, but can you tell us a bit more about the circumstances where this behaviour occurs, and how easy or difficult it is to produce (i.e. do you see the infinite loop consistently happening in an app, or is it a bit more random)?

/Pete

---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167#issuecomment-31981250
miguelfreitas
2014-01-09 22:41:17 UTC
Permalink
Hi Pete, for me it is fairly easy to reproduce with an ultrasound equipment we developed in PUC-Rio university using Opalkelly FPGA cards (which include USB ports). I can run our app + libusbx receiving tons of data for days flawless if the highvoltage ultrasound power supply is disabled.

However, if i turn the HV power on, there will be very short +100/-100V pulses being produced thousands times per second. This creates a lot of RF noise. With HV power on I consistently trigger this infinite loop in a matter of minutes, among with other transmissions errors which don't cause the app to freeze.

The software can't tell the difference if the HV power is on or off. I have already tried reading/writing to the buffers passed to libusbx to prove they are not invalid pointers.

If you want to perform any additional tests or try other theories please let me know. But i think the failure is well understood: the kernel can't perform a copy_to_user because the size of the transfer somehow got corrupted either in outgoing request (most likely) or in incoming transfer, so it won't fit the provided buffer that expected a different size. Still (as clearly shown in kernel code) even if copy fails, the urb is freed there, so this explains why cancel will also fail with this error. Then, by staying in this loop, we will be waiting forever for a non-existent event to occur.


---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167#issuecomment-31984952
Hans de Goede
2014-01-10 10:39:30 UTC
Permalink
<Sigh> as discussed to dead in issue #76 the problem you're seeing and trying to fix is caused by a bad pointer *somewhere* as the EFAULT clearly indicated. Strong NACK from me. For the last time stop trying to propagate the EFAULT, do your homework and debug the cause of the EFAULT and fix that, thank you.

---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167#issuecomment-32017465
Hans de Goede
2014-01-10 10:39:30 UTC
Permalink
Closed #167.

---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167
miguelfreitas
2014-01-10 11:08:31 UTC
Permalink
I already explained what causes the EFAULT, and it is not bad pointer, but you don't want to hear. so sorry.

---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167#issuecomment-32019265
Kustaa Nyholm
2014-01-10 11:24:14 UTC
Permalink
As complete outside to this thread observing from the far:

I don't understand Hans approach here.

I looks to me that here we have Miguel with a reproducible case where a
an infinite loop inside libusb can be triggered by disturbing the USB
communication externally without touching the code in any shape or form.

Miguel has demonstrated, at least to his own satisfaction, that the buffers
involved are ok by using a memmove on them.

Further he has looked at the kernel code and found out that a buffer
over run can also cause EFAULT and if I understood correctly, that
a wrong data length could be a possible cause. And if I understood
correctly this wrong length cannot come from his code but comes
from kernel or libusb.

Thus it looks to me that in this kind of scenario it is definitely wrong
if libusb call hangs because of a transfer error.

So why not either accept that there is a logic problem in libusb
in that it can hang (regardless of weather it actually is EFAULT or not)
or that Miguel has some problem in which case try help him to resolve
that?

In short, I'm getting the impression that Miguel is right here...

cheers Kusti


________________________________
This e-mail may contain confidential or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We will not be liable for direct, indirect, special or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on or as of transmission of this e-mail in general.
daves0
2014-12-14 21:26:19 UTC
Permalink
Here we are almost a year later, and the "not-found" error return from libusb_cancel_transfer is not being handled. It appears efforts here were stopped because one person can cause this problem in a manner that also involves EFAULTs. In the end, however, there's still a looping bug in libusb, is there not?

The loop can only be exited when a specific condition occurs / state is reached. There is a function call within the loop that can return a "not-found" value that indicates that the desired completion state will never be achieved. This return value is ignored. Sounds like a bug to me, no?

Can someone please correctly handle the failure (not-found) return from libusb_cancel_transfer that's inside of sync_transfer_wait_for_completion in a manner such that the endless loop doesn't occur any more?

Thanks!

Dave


---
Reply to this email directly or view it on GitHub:
https://github.com/libusbx/libusbx/pull/167#issuecomment-66930480
Loading...