WaitableEvent Posix impl


#1

Hi Jules,

According to the docs and the windows impl, WaitableEvent should wake all the waiting thread.
Problem is, the pthread version use pthread_cond_signal
instead of pthread_cond_broadcast.

Is this intended ?

What about adding a signalAndReset function too (equivalent to PulseEvent on Win32 and default pthread_cond_broadcast impl)

Thanks,


#2

ah - you’re right, it should be using pthread_cond_broadcast… Doh! Thanks for letting me know!

And I’ll make a note to do a signalAndReset too…


#3

Take care that PulseEvent under Windows is flawed.
Unlike pthread, if a thread has not reached the waiting point while another pulse the event, then it’ll never see that the event was pulsed, and will deadlock.

See why PulseEvent is fundamentally flawed

pthread_cond_signal is like PulseEvent in a way, and should be avoided too.


#4

That is quite nasty. What’s the rationale for needing pulseevent anyway? I don’t think I’ve ever written anything where it would have been useful?..


#5

It s not because you need to handle it with care that there is no use.

PulseEvent is not flawed it’s just the way it works. you have the same issue
with pthread_cond_signal and pthread_cond_broadcast
if you don’t have a flag.

If it’s not the behavior you want, then don’t use it.

If it was not useful then I wonder why windows and posix have implemented it :slight_smile:

You may want to use PulseEvent if you already handle on your own the on/off flag.


#6

The issue with Windows’s PulseEvent is that, even if a thread is stopped and waiting for the event, the Kernel can borrow it to make it perform some APC work, while at the same time, still allow the event to be pulsed. When this happens, your thread doesn’t see the event, and your application is deadlocked (your main thread thinks it has unblocked the other thread, while the other thread hasn’t finished waiting)

This means that your software will “work” 99.99999% of the case like expected, but it’s gonna break, of course in one of the major customer, and you’ll be unlikely to suspect the PulseEvent code.

Sadly we need the software to perfom deterministically every time.

It’s not the case with POSIX synchronization primitives through.


#7

oki doki.

My point was more about adding signalAndReset than the way it would be implemented :slight_smile:

In that case you can still implement it using SetEvent and ResetEvent on windows.

Thanks,


#8

[quote=“otristan”]oki doki.

In that case you can still implement it using SetEvent and ResetEvent on windows.

[/quote]

SetEvent and ResetEvent won’t help! :slight_smile:

read this http://support.microsoft.com/?id=173260

(honestly, I think this shit happens not only in the debug mode)


#9

The only solution that’ll work everytime is :
[list]
[] The main thread call SetEvent on a manual reset event[/]
[] The other thread simply “WaitForSingleObject” on the event[/]
[] The other thread call ResetEvent when it got the signaled object information (regardless of kernel APC, as it’s manual reset, the event is signaled until a thread reset it)[/][/list]

However, this means that you must have 1 event per thread, and possibly a CriticalSection protected/Atomic counter (that’s the way it’s done in .NET framework with their PulseAll method).

You can have a look to the openpthread solution for their condition variable that is immune to this.


#10

Well if it happened without the debugger too, I don t think you ll be able
to do reliable threaded apps on windows.
And I don t think that the case.
(No troll please)

So the issue with PulseEvent is only in the debugger. Not a big deal then.
If you relied on the debugger to debug thread logic you already screwed.

Thanks for the information.


#11

Ptdomaine is quite right here.
This happens when the kernel plays APC with your threads.
This means everytime another process calls ReadProcessMemory (like any AV scan does), or SuspendThread / ResumeThread / CreateRemoteThread, or when an “hardware” interrupt happens (like unplugging a USB device you might be using), or the first time you call TlsAlloc and so on.

Sure, it doesn’t shows up 100% of the time, probably not even 0.1% of times, but it CAN happen, so that it’s a road I wouldn’t go on anyway.

BTW, if I know this, it’s because it happened in our main software (which runs 24/7), and the crash dump they send was unexplainable unless such behaviour occurs (a thread was still waiting for a signaled event while the main thread continued, making the main thread waiting for the other thread’s work and “jobDone” event which never happened => deadlock).

In all cases, you can do whatever you want…[/quote]


#12

I really like it when you can’t trust the OS anymore :slight_smile:

Thanks for the information.


#13