Quick answer (not much time):

When doing short-time Fourier transforms (which is what is typically done for spectrograms), you calculate an FFT over sliding overlapping windows of samples.

You typically have 2 parameters: a window size and a step size (AKA hop size).

In principle, you could calculate a new FFT for each sample, but that will be very slow… So, you typically do it once every say 128 or 256 samples (that is your step/hop size).

Then for the window size, if you select a large one (say 16384), you will have a lot more detail for the lower frequencies, but the results will not be nicely “localized” in time, as your results cover a time period of 16384 samples.

Try with window size 4096 and step/hop size 256 or 512 and see if that’s more like what you expect.