Ok here is a good idea I had that will make my (and hopefully your) life easier:
Lets assume that your visuals are ahead of the music (in terms of production time). How do you make sure that they fit perfectly? I'm not talking about the easy case of a "beat every other second with a screen flash" but of longer segments that may need to fuse and they are off by a couple of seconds or more.
This is what we do: lets say that the music is slower than the visuals. You introduce a small "delay" factor that is a function of time. So:
DeltaT*=1.0-0.1*sin(max (0,min (1, (time-20)*0.1)));
this will slow down time for the visuals by the integral of this function all over 10 seconds. It is not noticeable. (Yeah I know, it is not really all that framerate independent, maybe you have a better way of doing so?).
Once you digest this piece of information I will continue with the rest.