If you want 40 A discharge current, then a fully charged lithium cell at >4.0 V will create a total load of >160 Watt. This is a lot of power. You mentioned TO-220 MOSFET. Most TO-220 devices can handle a few tens of Watts, almost certainly not 160 W. If you try, it will die. I'd recommend going to the much larger TO-247 or similar package device. These can handle much more power. You probably want to use more than one, though there are plenty of such devices which can handle 160W if kept cool enough.
Furthermore, the heatsink for the MOSFET absolutely must be attached to the metal tab side, not the plastic side. The tab is there specifically for heatsinking, not (primarily) for use as an electrical connection. That's what the pins are for. Most often some kind of electrically-isolating mounting is used, such as a "mica washer" or silicone pad, in addition to an insulating washer (if necessary) on the screw. This conducts heat but not electricity, but does add some thermal resistance, so devices will run hotter. If you can isolate the heatsink electrically then it may be possible to directly mount the device to the heatsink for better thermal conduction. As you know, for most MOSFETs the tab is connected electrically to the drain. In a load like this, the drain (and heatsink) would be at the battery + potential. Luckily, you are using low voltages, so no shock hazard. You would just need to make sure to avoid any short-circuit. This implies too, that if multiple MOSFETs are connected directly to the heatsink, then their collectors are implicity connected together by the heatsink, so consider that in the circuit design (probably not an issue here). Whether using electrical isolation or not, you need a thermal compound (heat sink grease etc.) between all pairs of solid surfaces.
When others said don't use a CPU heatsink, what was meant is that the heatsink itself is not an electrical component which can dissipate electrical energy in some controlled way. It is a mechanical component which dissipates heat. You need active devices like transistors or MOSFETs, or passive devices like resistors to convert the electrical energy into thermal energy (heat). These devices can and must then be thermally connected to a heatsink so that all that heat can be dissipated efficiently, keeping the device cool enough. Actually, I'd very strongly recommend using a good CPU heatsink with plenty of heat pipes, and preferably a sizable copper heat spreader at the base (the part normally in contact with the CPU) rather than direct heat pipe contact. Done right, such a heatsink can easily dump over 200 watts (probably a lot more) while maintaining a reasonable case temperature for the device. These heat pipe heatsinks are much more effective than an old school chunk of finned aluminum. Plus they are small, light, and cheap. The main downside is that they absolutely must have fan-forced air movement, no natural convection. I doubt you are looking for passive cooling.
If you can't do all the thermal engineering required, then the best advice is use more, bigger devices (MOSFETs), bigger heatsink, higher air flow, etc. Maybe you can reduce something if it all stays cool and reliable. Starting small and minimal will likely result in smoke.