I’m sorry for the lack of information I totally forget to tell it.
It’s a simple application to help people to speak in a better way, a scoring game around the speech (something like singstar but for speech) so the dB(A) value is needed as metric for the vocal but I don’t think a high accuracy is needed. Does it help you?
Ok so the best thing I could do would be to manually calibrate every device by hand since all device are their own behavior and regarding the sensitivity of the mic it would be in the 80-90 range. It won’t be possible to do it right now because of the quantity of hours needed (and to be able to find all the devices).
Does a calibration database could existe? I’ve not find one but maybe I didn’t use the right search terms.
I rather meant information about the algorithm itself (how things are calculated). You said earlier that only the variation of the dB(A) value matters for the outcome. That would still mean that you don’t need a proper absolute calibration because it would cancel out anyway. But that’s just speculation without detailed info.
I have a feeling you have no idea what accuracy is needed and why (no offense intended, it’s nothing to be ashamed of). If you can share the algorithm that you plan to use here, it’d be much easier to help. If you can’t share it publicly, I’d recommend hiring an expert under NDA.
Don’t worry @hugoderwolf you’re totally right I’ve no idea of the accuracy neeeded Also thank you for your patience and your help.
Here is a more detailed answer.
Here are the steps to get the dB(A):
Record on live speech from the mobile mic
Do FFT and get the dB for each 1/3 octave bands
Apply the A ponderation for each of these bands.
But the thing is that the next algorithm (that will determine the accuracy of the user speech) is something that is currently built by a speaker (he is a professional speaker).
He will record a speech as a “reference speech”. I will apply the previous algorithm to determine the different dB(A) every x ms of the record and will mark this speech with the highest score, let’s say 100/100.
After when a user will play, he will speak and the 1/3 octave bands dB(A) will be computed every x ms. If it’s different from the reference speech the score will be lowered.
Imagine the speaker plays the game and he/she perfectly repeats the reference, with only a very tiny difference: he or she unintentionally doubled his/her distance to the microphone. So this would be a level drop of let’s say 6dB*. According to your game rules the score wouldn’t be 100/100 right?
Unless you say, you don’t care about the absolute level (I bet even the speaker wouldn’t get 100/100 points). Then I have good news for you: you can simply calculate the difference between unweighted 3rd octave reference levels and unweighted 3rd octave recorded levels, A-weight the difference, remove the mean, and apply a distance measure (like sum(abs(differences)). With this distance you can calculate the game score.
Still not 100% sure about that goals of that game, but please keep in mind: in case the player has a different fundamental frequency due to age, gender, body size, health issues, or else, then there will always be spectral differences.
*might be lower due to room reverberation and speaker directivity pattern, oh wait or even higher: proximity effect / near field circumstances
Yep I second what Daniel says. It’ll be hard to get this to output any meaningful scores even when you somehow calibrate the mic sensitivity. Too many additional factors come in. Even if you keep the acoustics constant and just look at multiple persons (even if they get speech dynamics and rhythm right, which is what I think is the goal with this game).
Note that you need to record the whole thing and then process it completely to get the score if you do it by removing the mean difference. If you need some kind of realtime scoring (Singstar style), you can however do a longer moving average (a few seconds) along with the fast level tracking and use the difference for scoring. But I think this speak-along way of doing it wouldn’t be a good experience anyway, so recording fully and then computing the score should be just fine.
After a talk with my client, it’s okay to not have something really accurate. So yes, @danielrudrich the absolute level is not necessary
I think about appling an error margin. Something around +/- 5dB will still be valid I guess.
The point of this measure is to have a “feeling”. A monotone speech is something with a dB(A) constant contrary to a dynamic speech which has a great variation in the dB(A).
(@hugoderwolf yes, it’s a feature based on speech dynamics and rhythm).
Thanks for the algorithm, but I will have the reference A-weighted 3rd octave recorded levels (the values used as reference for the score) so I guess I could work directly with the A-weighted 3rd octave recorded levels. Or I may not have totally understand it.