This is not intended for realtime audio, but can be useful for computations in the editor or precomputing tons of data quickly, relieving the processor of all the work. .
compute shaders are not very different from normal shaders, it allows you to work with buffers, either for reading data or writing the results. for example to compute the DFT
const char computeShader[] =
R"COMPUTE_SHADER"(
#version 430 core
layout (local_size_x=256) in;
layout(std430, binding = 0) readonly buffer IN {
float samples[];
};
layout(std430, binding = 1) writeonly buffer OUT {
struct {
float real, imag;
} dft[];
};
uniform int size;
void main() {
uint gID = gl_GlobalInvocationID.x;
float sumReal = 0.0, sumImag = 0.0;
float angle = 0;
float incAngle = -6.28318530718 / size * gID;
for (int n = 0; n < size; ++n) {
float s = samples[n];
sumReal += s * cos(angle);
sumImag -= s * sin(angle);
angle += incAngle;
}
dft[gID].real =sumReal;
dft[gID].imag = sumImag;
};
) COMPUTE_SHADER"";
First it indicates that version 4.3 of OpenGL is required. Then the number of parallel processing units is established, in this case 256, then the used buffers are declared, and the size of the sample in a Uniform. For each invocation the DFT is calculated and stored in the index of the output buffer corresponding to the current invocation.
The shader creation is as always but specifying that it is a compute shader
computeProgram.addShader(computeShader, GL_COMPUTE_SHADER);
computeProgram.link();
buffers are created and used in the usual way but indicating that they are shader storage. it is only necessary that the data structure matches that of the shader
struct DFTResult {
float real, imag;
};
glBindBuffer(GL_SHADER_STORAGE_BUFFER, outSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(DFTResult) * dftSize, 0, GL_DYNAMIC_COPY);
after updating the buffer, activating the shader, bind buffers, set uniforms, you can invoke computation, in as many groups as necessary, the more units used the less groups will be needed and therefore faster
glDispatchCompute(dftSize/256, 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
At this point the processor simply send the dispatch command and the program continues to run normally. When the call to the memory buffer is made, for example with glGetBufferSubData, if the computation has not finished it will wait due to indication with glMemoryBarrier.
to know the number of processing units of the system available there is a call that I do not remember now. In this case for 256 units, a buffer of 8192 samples, 67,108,864 operations involving sin and cos are computed in a few milliseconds.
More precise information on compute shaders