Something like this: before digitizing, use an analog filter with corner frequency 300Hz. Sample at 5ksps. Take ~128 samples of the input (or roughly one cycle at 40Hz), zero pad it to 1024, and compute FFT. The zero padding will make it easier to interpolate. Find the peak frequency bin, then interpolate around it to find the exact location of the peak.

Sorry but sampling a signal with the highest frequency of interest at 70Hz at 5ksps is insane. A sampling rate of 500Hz leaves more than enough headroom for an analog filter. With a more cleverly choosen samplerate you can go much lower; just make sure to push the aliasing artefacts out of the frequency band of interest. And then padding to get a decent FFT length just adds to the necessary computation power. As others have pointed out: for this application FFT is not the right choice. You need to use a more optimised algorithm which (unlike plain FFT) doesn't compute results you don't need. For sure these algorithms use the same base as FFT but that doesn't mean that they serve no purpose.

Edit: I'm also not convinced that multiplying the input signal with a rectangle will actually work. Either by stitching multiple chunks together or padding with zeroes there is a chance you introduce frequencies which aren't there which do fall inside the frequency band of interest. Think about partial periods and the sampling won't be aligned with the input waveform.