There are a ton of ways this could be done. What I’ve done in the past is to use some enhanced MixerAudioSource where each source has a start position so you can effectively set them in time. Not sure this is right for your needs though, how accurate do you need the 500ms to be?
For this I would probably do as Vinn says and keep an OwnedArray of AudioTransportSources that you want to play and a pointer for the currently playing one. You should also inherit from ChangeListener and Timer. When you add a new AudioTransportSource register yourself as a listener to it. Once the source has finished your changeListenerCallback will be called and you can check if the source has stopped using AudioTransportSource::isPlaying(). If this returns false find the index of the one just finished using OwnedArray::indexOf() and set the currently playing pointer to the next element. Then call startTimer (500) and in your timerCallback() method start the transport source.
Obviously you’ll need to pay some attention to thread safety with the pointers etc. or you could use an indexing variable as the ‘currently playing’ and find the correct source with the OwnedArray::[] operator.
You could build this up and have another array with your wait times and use the array as a stack so once an item has finished it is removed. Then you could do as you originally intended and have some interface similar to Player::add (File newFile); Player::wait (500); Player::add (File newFile2); etc.
Just my 2 cents.