With low pass filter will reduce mains hum, but it is still only a 1st oder low passe and thus an attenuation factor of some 7 or 14 for 50 / 100 Hz.
A similar, additional filter action can be obtained from a larger capacitor (e.g. 1 nF) at the TIA in parallel to the 20 M.
The more effective way to suppress hum is averaging over a number of ADC readings to cover a multiple of 20 ms, so some 66 readings or a multiple of this with a 3300 SPS reading rate.
Depending on how good the frequencies match, the suppression may well reach a factor of 100 to 1000 (0.1% error in the frequency).
It still makes sense to have both the analog low pass filter and the suppression by integration of a multiple of mains periods, as one has to get the hum small enough, not to get saturation for the single readings or signal in bettween. The analog low pass also help with some other noise.
The PGA in the ADS1015 and many other SD ADCs is not a real amplifier, but realized by more frequent sampling of the input.
That's not what the datasheet says - it has a differential PGA before the sigma-delta ADC.
The datasheets shows the PGA in a functional block diagram - so there is a PGA functionality, but this tells nothing about how this function is realized. On a quick look I have not found an exlicite mentioning how this is done, but there are tell tale signs that this uses more frequent sampling: the input impedance goes down with more gain and the accuracy for the gain is rather good, with simple powers of 2 for the gain. It is common to find this version of PGA. E.g. the ADS1213 explicitely explains using the more frequent sampling to get similar gain steps.