I've been messing around with NVIDIA CUDA Cores. If rendering is the issue, CUDA is the solution. It is also the current solution to solving large matrix problems. Larger is better. The speedup can be enormous (like 20,000X) depending on the problem. Any time you can apply several thousand arithmetic units to solving a problem, things are bound to go fast. The newest graphics cards can do around 14 terraflops. The CDC 6400 that got us to the Moon was good for about 2 megaflops.
As to compilation (assuming gcc), on Linux you can increase the heap size to around 2GB and then use the -j4 flag to run separate compilations in each thread. The numeric is the number of jobs 'make' will dispatch - one per thread is about right! Use 'top' to see how many instances of cc1 are running.
Then there is concurrent Fortran if working on a new project. There is a written standard in Fortran 2008. Search for DO CONCURRENT. NVIDIA has such a compiler for the CUDA cores.
The increase heap and 'make' thing is pretty easy to do. Concurrent (parallel) programming is a good deal more difficult.
My only experience with a Dell Power Edge server in a quiet office is that it was very noisy. But this was a long time ago. Maybe things have improved. Or not...
Here's something fun: MATLAB has a parallel computing toolbox. Here is the Mandelbrot Set where the GPUs finally get to a 700+X speedup
https://www.mathworks.com/help/parallel-computing/illustrating-three-approaches-to-gpu-computing-the-mandelbrot-set.htmlI'm not sure how many cores they are using. I just looked at the pretty pictures.
Xilinx Vivado allows for parallel processing. I think the WebPack (free) version is limited to 8 threads.