Problem getting the real output latency

For my Piano Roll MIDI editor I want the playhead on screen to visualize the play position in the list of MIDI events.
My first attemt was to negatively delay the playhead on screen by the amount of samples of the buffer. So if my card had buffer size was 6912 @ fs=44.1kHz, I’d negatively delay the playhead locator on screen by 6912/44100 = ~0.156 seconds. That didn’t work, the playhead seemed to be too much on the right (it would cross a note visually before you’d hear it). By trying around, I found out that the perfect value is 4x longer, so negative delay of about -0.6 seconds, but I have no clue why.
I just discovered the function AudioIODevice::getOutputLatencyInSamples(). But that one reports 10368 (vs blocksize 6912), but that’s also not 4*6912. I am pretty sure that all math I did is correct, but that the reported latency of 10368 is not correct. I’ve done a further test that involved pressing a chronometer while pressing a key on the midi keyboard and then stopping the chronometer when the sound came out and it was 0.6 seconds, so like my 4x blocksize value.
Is this a driver problem I’m facing ? It’s just the normal DirectSound driver of my laptop, nothing fancy.

DirectSound is completely useless when it comes to reporting the latency. If you use WASAPI or ASIO, (or anything else, really), then the latency figure will be accurate, but with DirectSound, no chance!

Could you please explain me why the latency in your code is reported as bufferSize1.5 ? Looking at your code you seem to use a triple buffering scheme, is that correct? Wouldn’t it be more accurate to report bufferSize2.0 in that case?
PS: FLStudio 100% correctly synchronises audio and video using the DirectSound audio device I’m using at a even higher buffer size, which proves that it is indeed possible to quite accurately report the latency also using DirectSound, at least accurately enough for the eye not to see the error.

The values I’ve used are based on measuring the round-trip latency on various machines and taking a sensible guess at what the average machine’s latency seems to be. I don’t think it’s possible to do a better job on DSound, at least not on all machines. There’s no point in spending a long time tweaking it on your own machine, because different soundcards vary a lot. My advice is to forget about Dsound, and concentrate on using WASAPI, which is probably becoming the most common platform now.

A good thing to add to JUCE would be a function that reads the playback time from DS, ASIO, CoreAudio, etc… I think most of them provide such functions and most sequencers do rely on them for visualizing the playback position or syncing audio & video.