Philosophy of Voices, Sounds, Clicks and Layers

Should a voice be nothing more than an instance of a sound ?

Or should multiple sounds live their life within the same voice ?

When layering multiple sounds on one note, should each sound get it’s own voice, or should they co-exist and be mixed inside the same voice ?

To avoid clicks during voice stealing, should the crossfade be made between different voices or inside of the same voice ?

What is your take on this ? What’s your philosophy ? What do you think is the framework’s philosophy ?

i’d see a voice as a stream of sound that can be used to play samples being triggered by midi notes or something. so yeah, when the same note value overlaps that’s 2 different voices. there are different philosophies about voice steal rules. some synths let you choose between “steal lowest”, “steal highest”, “steal oldest”, “steal newest” and stuff like that. i’m personally a fan of oldest, because those are typically the notes that have already decayed the most. but if you have a very sustained sound you might want to go with steal lowest as lower notes are more masked by the other notes than higher ones perceptually.

btw if you wanna see a good example for a “steal oldest”-only synth, check out synth1 by ichiro toda. if you don’t know it yet, it’s an iconic vsti synth loosely emulating a nordlead. in the bottom-right of the window it has a visualizer that constantly shows not only that the architecture of synth1’s voice system is a ring buffer, but also which voices are active at a particular point in time

1 Like

In a sense yes, a voice is an instance of a sound. But you don‘t create a voice when its played. More like, it gets assigned when a sounds needs to be played.

Don‘t create the „same“ sound per voice, that doesn‘t do anything good for your memory usage. You could use shared ptrs to manage a sounds life time if you want an easy way of managing that and swapping things out from other threads. But keep in mind: shared_ptr itself is not thread safe and you will run into issues in your „main“ sound storage without further synchronization. And second, when a voice is the last to hold the sound, when it stops and sets the sound to null, suddenly the audio thread is the one to call free on the memory. That might ruin your day especially when dealing with large >10s samples.

I think the layering part is really implementation depended. I think it would be more beneficial to think about with what data structure you achieve your goal the best way. When the different sound/samples need to interact with each other, use the same voice. Otherwise I‘d use one voice per actual sound/sample. That makes reusing your software easier.

Never really thought about clicks when voice stealing. The idea behind voice stealing is to cut off a voice that most likely not being heard anymore. Think of a piano. 5 seconds into an unsustained note, there won‘t be much left of it. If you really run into clicks you should increase the number of voices before doing some fancy cross fading between voices. I don‘t think that’s worth the trouble.

1 Like

How about voice stealing when the same note is played twice ? Possibly with long release time ?

that is an option, sure. just depends what kinda style you go for with your synth. synth1 for example lets you play and overlap the same note up to its full polyphony and that feels super nice when you have a lot of release and the arp turned on, cause then you know… all these sounds are just making it so dense together, especially when you dial in a bit of detune as well (synth1’s equivalent to unison), a very rich texture.
but if you let every note only be playable by 1 voice you’d get a straighter sound where every note is always clearly recognizable as the same thing over and over again, a hard sound that is nice for the harder types of music, where consistency is more of a valuable thing, especially if you combine it with fixed phase oscillators.

1 Like