TP1.0 took about 32K of RAM to run IIRC on CP/M. I don't remember exactly the size of the full "distribution", but I think it was under 200K, all files included.
As to bloat and time wasted, that is true to a large extent, but the merit of tailoring your own toolchain goes beyond just saving a few tens of MB IMO - which is why I said that I thought configuring and building it yourself, for someone who wants more control, was way more beneficial that just trying to strip an existing one until it stops working. But it's all a matter of perspective and needs. I don't do that for ARM toolchains, but I do maintain my own RISC-V toolchain.