The thing which is so cool about twin pass band tuning is the time domain response is set by the width of the filters, not the passband of the cascade.
I calculated/simulated this for two cascaded 10th order Butterworth bandpass filters, fc=10kHz, bw=500Hz (just an arbitrary example).
Take a look at the plots. The envelope's step response definitively
does depend on the passband of the cascade.
If the two filters are tuned to the same center frequency, the -3dB cascade bandwidth is ~456Hz, the envelope rise time is approx. 2.3ms, but (as expected) there is some overshoot.
The narrowest -3dB cascade bandwidth I could achieve was approx. 100Hz, when the two filters are de-tuned approx. +-250Hz.
[ If I de-tune more than +-250Hz, then the overall -3dB bandwidth of the cascade becomes wider again, with a dip in the center. Even though it is a 10th order bandpass, it still has a limited roll-off in the stopband, which eventually determines the narrowest achievable bandwidth for the de-tuned cascade. With 10th order Butterworth bandpass filters as used in this example, a cascade passband narrower than approx. bw/5 is obviously not possible. ]
Now the important point: The envelope rise time of the de-tuned cascade is no longer 2.3ms, but now it is approx. 6.9ms. And it does not ring any more, because the frequency response of the cascade rolls off softly near the center (not a flat top anymore -- see plot). If the time domain response of the cascade would be determined only by the time domain response of the wide-band filters (as you claim), then we would not see a different time domain response now. But we do see a much slower response with the 100Hz cascade bandwidth than with 100% overlap.
For comparison, an ideal Gaussian bandpass with 100Hz BW would have ~6.65ms envelope rise time. So the resulting rise time of ~6.9ms comes indeed close to Gaussian, but it still does not beat a Gaussian. As you said, there is no free lunch. Regardless how you do it ("twin" or otherwise), you cannot outwit the trade-off between rise time (pulse width), overshoot and bandwidth. The frequency domain transfer function of the cascade is still the product of the transfer functions of the two filters, and the impulse response of the cascade is still the inverse Fourier transform of the cascade's transfer function. Even a filter realized as a combination of two de-tuned partially overlapping wide-band filters cannot outwit that.
EDIT: Attached Octave script
pkg load signal
fc = 10000
bw = 500
detune = 0
f1 = fc - detune
f2 = fc + detune
points=2000
fs=100000
t = [0:points-1]/fs;
signal = sin(2*pi*fc*t);
[b1,a1] = butter(5,[(f1-bw/2)/(fs/2) (f1+bw/2)/(fs/2)]);
[b2,a2] = butter(5,[(f2-bw/2)/(fs/2) (f2+bw/2)/(fs/2)]);
[H2,f2] = freqz(b2,a2,fs,fs);
[H1,f1] = freqz(b1,a1,fs,fs);
figure 1
plot(f1,20*log10(abs(H1.*H2)))
xlabel("Hz")
ylabel("dB")
grid on
figure 2
plot(1000*t,filter(b1,a1,filter(b2,a2,signal)))
xlabel("ms")
grid on