Search the web for partitioned convolution. The easiest to implement form is the uniformly partitioned convolution, where you split Input Signal and IR in equally sized fragments and convolve them. The audio signal already comes in blocks. However, in case the number of samples provided is smaller than blocksize, things get messy. JUCE did a very good job on their convolution which is uniformly partitioned. With the uniformly the highest FFT order is the one you need for twice your blocksize
For longer IRs, non-uniformly partitioned convolution is the most efficient thing to use. It splits the IR in blocks of different sizes. Mostly used is the Gardener scheme: N, 2N, 2N, 4N, 4N, and so on. The idea behind this is, hat you have an additional block of time available to calculate the second block of your IR, and you can even use threads for it. However the scheduling part is the hardest thing to implement.
