Dsp steps

If you had a single stereo A.I. track and wanted it to sound less harsh and more finished without stems or re synthesis,what DSP steps would you try first . Thanks in advance for any input

The first DSP step is arguably the most difficult one, but I’d put my money on the 4th one, especially when it comes to all things harshness and finishes.

I’m sorry to say I think this is the wrong forum for such a question.
I would suggest you learn the basics of EQ (equalization and filtering) to tame harshness. You do not need to program anything custom for this, you can use plenty of free tools for processing a stereo audio file.
To make it sound more “finished”… Well that’s a whole subjective term in itself. But if you don’t want to learn anything about mastering, you can use something like Landr or iZotope’s Ozone. Or you can hire a human engineer who will produce much better results more reliably (citation: https://www.youtube.com/watch?v=wZRV2H4PK0Q).
If you want to learn these skills, it’s a long road, but starting off with some mixing and mastering tutorials will get you there.
I don’t think you need/want to learn C++ and the JUCE framework, unless you want to build a plugin or app for this, but I would advise against that unless you already have the tools and expertise necessary.

Thanks for taking the time to reply.i should clarify im not looking to learn mastering or build a juce plug in. Im exploring the possibility of whether there is room for a lightweight,post generation process aimed at casual creators who like their track but experience fatiguing across common listening platforms. Thanks for your reply.

So just a quick follow up on my previous post. I’ve now passed identical lyric sets through multiple Suno generations, using a couple of different style targets, tempo changes etc. What I’m finding consistent isn’t the music, but the failure mode.. I’m seeing fast delivery and dense consonant clusters seem to collapse into the same kind of smearing and slurring, regardless of style and speed. It feels to me, less like bad vocals and more the systems hitting a resolution ceiling when it’s exposed to pressure.

I’d be interested to hear if anybody has seen this kind of behaviour at the waveform / phonetic level?. Many thanks in advance

Quick update on my previous posts..

After much listening . The one thing that has emerged. Is it possible the real issue sits more in how the vocals are being forced to relate to a highly regular grid, rather than the generator itself?. Again, many thanks in advance

Just an update.

I’m not approaching this as any kind of expert in DSP or audio systems. I have a strong conviction,built through hours of repeated listening,testing and failure, that there’s perhaps something here worth closer examination. Interested to understand how people with the relevant experience would frame it, test or even dismiss the idea.

Does it sit within the realms of practcal feasibility ? I’m not here to try and define possible solutions, but to create conditions where expertise could engage with the question. Thanks for the read and any future input.. kind regards

Since you are trying to fix something generated by AI to begin with, maybe just throw more AI into it to fix the results of the other AI? :man_shrugging:

There is no general easy automatic way to make things sounds “better”, “less fatiguing” etc. Those are such subjective, vague and broad things. Or if someone has figured out a good way, they’d of course keep that a secret, sell that as a product and become rich. :slight_smile:

Hey.. thanks for the reply..I’m not actually looking for a general or automatic “make it sound better”solution…im looking to answer the narrower question ,whether constrained micro-timing irregularities within the transitions can measurably reduce perception of rigidity and fatigue in a.i . generated audio .. if its “no” or “already well understood”.. that’s a perfectly valid outcome.. many thanks for your response. Appreciate.

If you are looking for more controls over the vocal, perhaps Dreamtonics Synthesizer V or ACE Studio would be a better choice. They give you much more control over the vocals so that you can fine-tune each word :slight_smile:

Disclaimer: I am not affiliated and I haven’t used them for about two years.

1 Like

Hey.. thanks. I’m aware of tools that allow deep per word control. I’m exploring if its possible that some of the micro-intervention can be somewhat reduced, upstream, so the first pass could, perhaps ,feel less rigid, pre manual refinement.. thanks for your response :+1: