Captured audio becomes garbled after several hours

Wow…good job reproducing this! I wonder if it happens in a virtual machine (I’m thinking an Oracle VirtualBox instance with Windows XP installed).

[quote=“TheVinn”]
Wow…good job reproducing this! I wonder if it happens in a virtual machine (I’m thinking an Oracle VirtualBox instance with Windows XP installed).[/quote]

Thank you and yes I can reproduce it on a virtual machine (not on Oracle VirtualBox doh, I had trouble configuring the sound on it to work with the feedback trick I use but I did not try very hard).

Following my latest batch of overnight tests, I am also starting to think it is a function of the buffers size, larger they are the less it happens and the faster it falls back on its feet, but I am still testing this…

Did you try my suggestion about using ONLY the input channels? Because DSound input and output devices run separately, it’s almost inevitable that over such a long period of time they’ll drift apart and it’ll start to struggle. But if you use only input channels, you won’t hit that problem. If you need output as well, I’d suggest creating two completely separate audio device instances, one for input and one for output. As long as you don’t have a single device trying to do both, they should run independently without sync problems.

Yes I did and it did happen, but I will try it again to make it happen a few times with the latest 2.0 Demo to be 100% sure.

Just a little update with the results of my latest tests and debugging, just in case this might tell you more than me or remind you of something…

  1. The problem happens even if I disable output channels in the Juce Demo. (also set number of outChannels to 0 in AudioDeviceManager::initialize in previous tests).
  2. The most useful clue, is that the problem is cyclical; i.e, at 20.3ms, it will be good for about 7 hours, then bad for 7 hours, then good 7 hours, etc. – this might indicate a counter that goes in the negative, then back in the positive, or something like this
  3. The cycles are larger if I set a larger buffer size, and the problem is less observable audio quality of bad files is less bad. – Can’t establish an exact proportion, but anything with less than 58 ms is quite bad, while anything above is much better.

I also wrote a windows recording application that does not use Juce, in which I record the audio using MMSystem (easier API – took me less to write), and waveInOpen / waveInStart as I was wondering if maybe the problem is deep within the audio driver. Problem does not happen but I realize it’s far from the way Juce records with Direct Sound. I am about to dive a bit more into DirectSound’s API…

Nasty. You have my sympathy in trying to debug this!

But based on what you’re saying, it really doesn’t sound like there’s anything in juce that could be causing this. If you’ve disabled the output device, then the code that just keeps reading blocks from the input device is pretty minimal, and doesn’t depend on any variables that could wrap or behave in a cyclic manner like that… So my gut feeling is that this it’s the driver that’s going wrong, especially since you say the times depend on the buffer size - there’s just nothing in my code where that would make a difference.

And yes, the MMsystem API is very different from Dsound, so isn’t really a valid comparison. If you were going to write a test app, you’d probably want to copy the minimal possible subset of the juce DSound implementation and try to get it running.

Have you tried putting extra logging in the DSound code? E.g. to print the buffer offsets that each block is being read from? There might be some kind of pattern that’ll give you a clue.

Thank you :slight_smile:

My next step, although I am probably gonna see if I can’t find some demo code on the net first, to make sure that if there is a bug (even if the chances are small) I minimize the chances of reproducing it.

I am doing that right now.

I have continued my background tests, all while enabling logs and adding more instrumentation.

  1. I have learned that even with 0 output channels, resync will still be called.
  2. It happens when it takes longer than 3 times the buffer size (latency) for the thread to read the input channels and copy them – but I still don’t know why…
  3. It therefore makes sense that with larger the buffer sizes this happens less, and the smaller the buffers are the more it happens.
  4. I am still not sure if this is the right path or merely coincidence, but I will spend some time looking into this.
    • I first thought that time would be spent either on the Lock of the buffer(maybe perhaps due to an internal mutex), or on the copy but it does not seem to be so. The thread might be simply context switched…

I had two tests running last night, one with 58ms buffers and one with 44.6ms. They ran for about 17 hours.

The 58ms had no bad cycles, and 0 resyncs, while the 44.6ms demo did the 7 hours good / bad / good cycle as predicted by previous tests. It also did about 25 resyncs at:

27 Aug 2012 6:14:54pm - GetCurrentPosition 7056
27 Aug 2012 6:55:03pm - GetCurrentPosition 12348
27 Aug 2012 7:55:48pm - GetCurrentPosition 17640
27 Aug 2012 8:05:50pm - GetCurrentPosition 1764
27 Aug 2012 8:20:57pm - GetCurrentPosition 8820
27 Aug 2012 9:06:19pm - GetCurrentPosition 15876
27 Aug 2012 9:10:54pm - GetCurrentPosition 22932
27 Aug 2012 9:59:23pm - GetCurrentPosition 12348
27 Aug 2012 10:24:46pm - GetCurrentPosition 15876
PROBLEM HAPPENED
28 Aug 2012 12:05:25am - GetCurrentPosition 24068
28 Aug 2012 1:13:14am - GetCurrentPosition 8192
28 Aug 2012 1:31:08am - GetCurrentPosition 8192
28 Aug 2012 1:37:46am - GetCurrentPosition 22304
28 Aug 2012 2:18:01am - GetCurrentPosition 1256
28 Aug 2012 2:44:45am - GetCurrentPosition 15248
28 Aug 2012 3:04:51am - GetCurrentPosition 4784
28 Aug 2012 3:24:31am - GetCurrentPosition 11720
28 Aug 2012 5:51:45am - GetCurrentPosition 18776
28 Aug 2012 6:22:00am - GetCurrentPosition 628
PROBLEM GONE
28 Aug 2012 6:42:05am - GetCurrentPosition 4156
28 Aug 2012 7:13:12am - GetCurrentPosition 9448
28 Aug 2012 8:23:31am - GetCurrentPosition 23440

If it’s doing a resync, then something in your input callback must be blocking… If you search the DSound code for “resync” you’ll see the logic that handles it - it basically says that if it has to wait more than 3x the block time for the blocks to become ready, then it assumes that something’s gone wrong, and resyncs. You could actually try increasing (or even removing) that timeout, just to see what happens… It’s also worth looking carefully at your callback code, in case there are some locks or operations in there that could occasionally cause a delay.

Thanks, Jules, unfortunately recent tests disprove the theory I had yesterday that this had anything to so with resync so I will stop looking in that direction.

yeah, i saw this, point 2 above.

I was the impression the timeout had to do with

totalBytesPerBuffer = (3 * bytesPerBuffer) & ~15;
captureDesc.dwBufferBytes = (DWORD) totalBytesPerBuffer;

Just for the fun of it if I will try it later in desperate times, if I increase the timeout shouldn’t I also increase the size of the buffer(the internal direct x buffer above), or are they are unrelated? Otherwise what is the reason of the timeout?

Since I started to get into this more actively I stopped using our app and use only the Juce Demo 2.0 latest version - so could probably disable the live display to save the repaint, and use libsndfile directly, but I had already reproduce the problem before. I was also under the impression that the callback code time would not impact the timeout we are talking about, as the callback is called outside of the for (;:wink: loop in run() so it is not counted, but I might be wrong. Anyways, like I said I am going to explore different avenues then the resync.

No… I chose the timeout on a pretty arbitrary basis - if it’s been waiting for longer than 3x the maximum buffer time, then assume that something has gone wrong. But the 3x is just a number I picked because it seemed reasonable, not for any particular reason.

Just a data point here, but my app also becomes destabilized if you run it “overnight” - the audio just ceases to play.

I never tracked this down beyond that, but it never seems to happen if you don’t leave the program running for a long time.

At the time, I estimated that this was occurring on or near 2**32 samples, and assumed it was a 32/64-bit issue in my code or Juce’s. I did go through and make sure that I had all 32/64 warnings dealt with, but I didn’t really work on it hard as it’s low priority for me.

Fairly soon I will be needing for a new application very long recording times and then I’ll be caring a lot more.

I’ll be back at my main workstation soon and will take a look at this again at some point.

We might have seen something wrong, although I am still not 100% sure.

Look inside service() where the polling with GetCurrentPosition (&capturePos, &readPos); is done.

Sometimes the capturePos is between the readOffset and the readPos, i.e. capturePos went around the circular buffer and past readOffset, so what we are going read from it might have been overwritten. In theory this should have been caught by the Lock, but it does not seem to be. We do not think this is OK.

I have done two tests to prove this.

  1. I increased the total buffer size to 16 times instead of 3 times. I choose 16 as this is what the DirectSound demo did… Problem did not happen anymore (during one night)
  2. I added logs when this happens. I can confirm this happens more during the bad periods and less during good ones, but it’s not as black and white as I thought :frowning:

I am working on a “fix” / “test” in which( with a 3 times buffer size) I will catch this condition and deliver the first bytesPerBuffer preceding the readPos, which are surely not corrupted, instead of the ones following readOffset which might be. Even if it falls back on its feet, I wil still suggest having a larger than 3 times capture buffer so that it does not happen almost at all or much less.

Interesting… I look forward to hearing what you find!

Hello Jules,

This will (hopefully) be my final update on this thread. Thanks for all your time and advice.

The solution I proposed above did not work unfortunately although I still do not understand 100% why, I still think it makes more sense to re-synch when we have an indication that the audio might be corrupted than on a timeout strategy, but my fix failed to prevent the bad cycles, showing that re-synch nor cause nor fix the problem when it does happen. I still think that the problem is caused by the capture/write position of the internal buffers having caught up with what we read from them, but re-synching does not seem to fix it, and I don’t have the time to dig more into this…

So I went with a workaround which is to increase the number of buffers (latency) that there are in the circular buffer. You originally had 3, I changed it to 16. This seems to work even for smaller buffers like 46.4 ms that we use in our app for at least 12 hours in a row. (16 was chosen as that was what the DirectSound demo code I looked at was using, this code was given by Micorsoft with an older version of DirectSound SDK)

I leave it up to you if you want to increase or not this in Juce. In fact if you think of a negative side effect, except additional size, please me know.
Thanks again,

Thanks, very interesting. I can’t see any reason why that number shouldn’t increase, and can’t think of any reason it’d break anything else, so based on your empirical evidence I will increase it. Many thanks for sharing your experimental results!

You’re very welcome and thank you for all your advice too. I’ve no been running it for more than two days with no problems.