Streaming audio inputs over network problem

Hello :slight_smile: ,

I am new to Juce, and I’m trying to develop a program where I can stream the inputs from my audio interface to another audio interface over my local network.

Concretely, I have Computer A with audio interface A and Computer B with audio interface B. And I want that whatever audio that is captured on input channel 1 on the audio interface A is sent to output channel 1 on audio interface B. (and vice versa)

I decided to use Juce for this, as it seemed very practical to control audio interfaces. So what I did is I created a standalone Audio Application with the Projucer. And I followed the tutorial on OSCReceiver / Sender to create my network communication and send my Audio Buffers.

So my implementation is as follows:

  1. In getNextAudioBlock() of my MainComponent, I get the input buffer (I have only one channel active) and pass the raw input buffer to my ConnectionManager (which sends it) like this

     //================================= LOCAL INPUTS =================================
         int i = 0; // Used for sending to the corresponding freq diagramm component
         for (auto inputChannel = 0; inputChannel < maxInputChannels; inputChannel++) {
             if (! activeInputChannels[inputChannel]) // Individual input channels may be inactive so we don't want to store the data
             bufferToFill.buffer->clear (inputChannel, bufferToFill.startSample, bufferToFill.numSamples);
         else {
             // Here we get pointer to the input and send it to the network
             auto* inBuffer = bufferToFill.buffer->getReadPointer (inputChannel, bufferToFill.startSample);
             // If we are connected, we send data to the distant instance and we play the data
                 networkManager.sendAudioBuffer(i, inBuffer, bufferToFill.numSamples);
  2. In my sendAudioBuffer() function of my NetworkManager, I create an OSC message object where I add the channel (ch0) as an Int argument and then I add all the samples of the buffer as Float arguments (e.g with buffer size 512 the message has a length of 513)

juce::OSCMessage msg(juce::OSCAddressPattern("/creepy-table"));     // Declare and initialize message
msg.addArgument(juce::OSCArgument(inputID));                        // Add the channel id
for(int i=0; i<bufferSize; ++i){
    msg.addArgument(juce::OSCArgument(buffer[i]));                  // Add every sample
  1. My NetworkManager implements OSCReceiver and in the oscMessageReceived() method, (on the other machine) I get the message get the channel and buffer out and put it in a juce::AudioBuffer
void NetworkManager::processOSCMessage(juce::OSCMessage message){
    if (message.size() > 2){
        // Case where it starts with int, then it's an audio buffer with first argument as channel
        if (message[0].isInt32()){
            int channel = message[0].getInt32();           // Declare channel to use
            // Channel has to be positive and msg length has to be inferior or equal to buffer size
            if (channel >= 0 && message.size()-1 <= distantAudioBuffer->getNumSamples()) {
                juce::Array<float> buffer;                 // Declare a temporary buffer for practicality
                for (int i=1; i<message.size(); ++i){      // Starts at 1 because in this formatted OSC messsages the float arguments start after 1 (0 is for channel)
                    if (message[i].isFloat32()) buffer.add(message[i].getFloat32());
                distantAudioBuffer->copyFrom(channel, 0,, buffer.size()); // Put the samples in distantAudioBuffer then used by distantAudioSource object in mainComponent
  1. The same AudioBuffer is then read in getNextAudioBlock() and the content of my distant received buffer is copied in the outBuffer to be played
auto* outBuffer = bufferToFill.buffer->getWritePointer (outputChannel, bufferToFill.startSample); // Buffer to output
auto* distantBuffer = distantAudioBuffer.getReadPointer(outputChannel); // Distant input buffer
    for (auto sample = 0; sample < bufferToFill.numSamples; ++sample)
        outBuffer[sample] = distantBuffer[sample];

So it kinda works, I manage to send and receive the messages, however my problem is that the signal received is …weird. I tried to send the signal from an electric guitar connected to my audio interface A, and basically it sounds horrible. Notes are still somehow recognisable, but there is a huge buzzing sound on top whenevever there is sound to be played.
(If you want to listen to it you can check it here, but it’s horrible)

I display (like on of the juce tutorials does) the signal I output to the speakers. And it looks similar to the signal originally sent, but the received signal looks imprecise. I don’t know if my weird sound comes from this.

I am a bit stuck, and I’m not sure how to fix my issue as I don’t have much experience in audio processing and Juce. :confused:

  1. Does the use of OSCMessages make sense to send Audio Buffers via network? Does it even make sense to send Audio Buffers?
  2. Do you have an idea why the received signal looks less accurate and sharp then what is originally sent? Could it be from the way I read my buffer?

As additional infos: I use a samplerate of 44100Hz and buffer size of 512 with bit depth of 24

If someone could help me figure out ways to continue, or give hints / opinions on the methods I use / my implementation, it would be greatly appreciated :slight_smile:



Well, using OSC for sending realtime data like that is a rather bad idea… I’m telling you this because I tried something similar a time ago when I started with programming :wink: And there are even more, OSC unrelated things to consider:

  1. Synchronization. You are used to sample rate, but did you ever wonder where that super exact clock gets generated? It’s a hardware unit inside your audio interface. In a professional studio environment you make sure that when connecting digital audio devices, clocks are perfectly synchronised, this can be achieved by using explicit word clock cables or by deriving clock information from digital audio formats such as adat, spdif etc. or by using highly sophisticated audio networking solutions like Dante that embed a high precision clock signal in networking packets. Now with your approach, you connect two free running devices at both ends. Even if you were okay with adding some latency to perform some intermediate buffering both sample rates will sooner or later drift slightly over time, causing dropped samples
  2. OSC as you use it uses a UDP connection. A UDP connection however makes no guarantee that network packets sent into a specific order will be received into the same order. So it’s totally possible that you might receive your audio blocks in another order then sent. Creating a reliable mechanism to avoid that is somewhat difficult
  3. Realtime safety. Your getNextAudioBlock callback gets triggered from a high priority thread from your operating system, which is in the end over some low level connections is highly related to your audio interfaces clock. This means that your audio interface will run and won’t wait for you if you take too much time in the getNextAudioBlock callback. If you are not quick enough here, you will simply lose samples that, generate crackles and unpleasant noise. To me, this is one of the most exiting parts of writing audio code, as you really need to have a clear idea how to write fast code. A few examples of possible no-gos in this case are: Performing external IO (Network :flushed:) as this might block code execution for a non-deterministic time, depending on a lot of reasons. Performing heap memory allocation (e.g. using Strings, HeapBlocks etc. – you’ll find plenty of them if you look inside the OSC classes) as these memory resources are managed by the operating system, which might be busy and not respond to your call as soon as possible or might decide to swap out ram to your hard drive at some point. Waiting for locks held by other threads… you never know when the other thread will have finished its work and if its thread priority is lower then your super high priority audio thread this will cause so called priority inversion.

Now that being said, I hope that I didn’t demotivate you. While its nearly impossible to solve point 1 in terms of perfect synchronisation, there are possible workarounds to this and clock drift is a minor issue compared to the other points. But I want to point out that there is no technically 100% perfect solution to this, at least if you don’t build your own hardware. The other two points can be solved by choosing an appropriate network protocol and by offloading the streaming work to a different thread, using lock free queues for inter-thread communication and a proper amount of buffering. However, be prepared to experience a major latency here. Implement low latency streaming this way won’t be really possible without digging deep into a lot of topics.

But what is the overall use case? Maybe there is a better approach to what you originally want to achieve?

Thank you very much for this extensive answer!

But what is the overall use case? Maybe there is a better approach to what you originally want to achieve?

You are right maybe I should start with this :sweat_smile:
In my case, it’s maybe a bit weird, but I want to create an experiment in my workplace where 2 tables are connected. Where one table plays what happens on the other table. For example if one person puts a cup on the table, you would hear on the other table the cup being put. Or for example if someone taps the finger on the table you hear it on the other. :smiley:
To do this I have 4 contact mics on each corner of one table and 4 transducers on each corner of the other table. So I have a basic spatialisation, and the table vibrates a bit.

So basically I made a first version where the tables were close to each other and everything was plugged on one interface and controlled via Ableton. It worked well but it’s not possible to keep it like this because of cables, and the distance that the tables should have in the building.

That’s why I’m trying to do it over (the local) network this time.
So in terms of audio quality requirements, it’s doesn’t need to be very high:

  • Latency: there can be some latency, as long as it’s not like 1min latency or so
  • Sound quality: the sound replication on the second table doesn’t have to be perfect at all, but it should be good enough to recognize correctly.

I guess what I imagined was something similar to softwares like Skype, Discord etc… where there is some latency but it’s not a problem, and the sound is just fine (also I guess that latency and stutters shouldn’t be too high as it would be on local network) :slight_smile:

So concerning your points (thanks a lot for the explanations!)

  1. Good point, I always heard about clocks in audio interfaces, but never really digged into it. So in my case, as I don’t want, let’s say, studio accuracy in terms of latency and quality, I don’t how much this sample rate drifting would cause problems?
  2. So my idea is to stream in real-time, that’s why I thought using UDP was actually better because (if I’m not mistaken) that’s what is used for streaming video and audio usually right? Or how is it usually done in a voip software like mumble? Do they also ensure the order of audio blocks?
  3. It’s very interesting, so if I understand correctly, I would need to run threads for the networking part and put my received audio blocks in a thread-safe queue, that could be used for playing by the getNextAudioBlock thread?

Again thanks a lot it is really helpful, even though I feel a bit overwhelmed now :sweat_smile:

So my idea is to stream in real-time, that’s why I thought using UDP was actually better because (if I’m not mistaken) that’s what is used for streaming video and audio usually right

They use RTP on top of UDP. There are some libraries out there, eg this one from cisco:

which looks legit but a bit heavy, this library has been archived but looks to be sufficient for your use case:

And just a sidenote, if you didn’t want to write your own software for your experiment you could use a Dante/Ravenna/AES67 enabled interface. FWIW AES67 is the (recent) industry standard for AoIP and its on top of RTP.

1 Like

Thanks for the infos I will check it out :slight_smile: