I solved the problem, but the results were contrary to my expectations.
Short version: Orcad PSpice versions 16.6 and 17.2 are very poorly optimized for multi-CPU/multi-thread PCs and run the fastest on 1 thread only. Enabling more than 1 thread slows down transient simulations significantly.
Long version: I discussed it with a colleague from a different department who has newer PSpice 17.2. In it, the number of threads can be changed from the GUI, it's hidden in Edit Simulation Profile -> Options -> Number of Threads. It's set to 0 by default, which means PSpice employs half the threads available in the PC. In my case it was 12, beause I currently work on AMD Threadripper 2920X (12C/24T). But when I limited it to 1 thread, transient simulations suddenly run more than twice as fast!! Setting it to 2 or 8 threads doesn't make much difference, it's almost as slow as default 12. It has no effect on AC simulations, they always take the same time. According to the colleague, HSpice simulator and Intel CPUs also suffer from this problem. Apparently, Cadence haven't even touched the Spice core in the last 15 years or so - even free LTspice is light-years ahead when it comes to multi-threading.
Unfortunately, we found no way to how to set the number of threads globally in PSpice, you have to change it manually for every single new project. But the colleague created library CUSTOM.OLB which contains a special part OPT_CMD. You can place it into Capture schematic and it passes .OPTIONS to the simulator. I'm attaching it below, it should work with all .OPTIONS commands PSpice recognizes. Enter only one command per part, but theoretically you can place as many OPT_CMDs as you want. For example, the ACCT command appends CPU, memory, runtime and other statistics to the resulting .OUT file. That's how I measured the runtime.
Which brings me to another question: does anybody have the latest PSpice 22? And if so, did they improve its multithreading?