Modern CPUs have 'huge ALUs' that you like.
The last few generations of Intel desktop CPUs have AVX512 instructions and this means it can do a calculation on 512bits wide worth of data in a single instruction. So you could use this to do say 16 multiply operations on 32bit floats and it takes the same amount of time as doing a single float multiply. Or you could use this to add 64 different 8bit numbers together in a single instruction. Not all CPUs have AVX512 support but the older SSE instruction sets that work with 128 or 256 bits at a time are on pretty much all modern x86 CPUs. This whole thing mostly started with the Intel MMX instruction set where they got into pushing more SIMD math support into the x86 chips. This is mostly used to give them much enhanced DSP capability that comes in useful when working with large amounts of video and audio data (hence the name MMX being MultiMediaeXtension)
The ever popular ARM chips also have similar SIMD instructions, even the little CortexM4 microcontrollers, but they don't work with this many bits, since they don't have as much transistor count budget available to implement that.
PCs did have a PhysicsProcesingUnit for a brief time period. The company Ageia developed hardware acceleration for a popular physics engine called NobodeX and it did speed up physics calculations by a lot. This only lasted a few years before Nvidia bought the whole company and ported the PhysX drivers to run on there GPUs. The graphics cards already had programmable pipelines back then and lots of powerful 3D matrix math capability, so it was just a matter of porting the physics math over to what the GPU can execute. The PhysX engine can fall back to CPU mode if appropriate acceleration hardware is not available (allowing games using it to still run with worse performance) and is now ported over to even a lot of game consoles (including ones without Nvidia graphics hardware or x86 CPUs).
CPUs have gotten so fast that physics is actually easy for them (especially with the modern math instructions). So a lot of games might not even use any hardware acceleration to run them. They focus on GPU accelerated physics more when there is a lot of physics to be done, such as simulating a huge number of objects or the kind of physics where things are made from lots of tiny sub objects (like fabric, fluids, smoke, particles etc..)
When it comes to building DIY CPUs it doesn't really make sense to pursue powerful huge bit depth parallel math capability. Whatever you build is going to still be laughably slow when compared to the CPU in the slowest cheapest laptop on the market. Sticking to a reasonable number of bits is more than good enough and saves complexity and amount of wiring. You also need to write your own software to make use of it, so this takes more time for complex things that could make good efficient use of the math horsepower. The DIY CPUs are mostly about keeping things simple enough to easily understand, debug and build. With modern components these CPUs can run pretty fast anyway (compared to the equivalent CPUs of back in the day). A better place to spend the effort might be graphics hardware for your DIY CPU since you can make it do some impressive stuff with not that much logic, just moving lots of pixels around quickly.