Starting from the LT3748's test fixture and just lowering the coupling coefficient from 1 to 0.98 is enough to make the simulation very slow compared to the original test fixture (which only takes a few seconds). This is probably due to the solver taking many more steps to converge in this case.
On my Core i7-5930K (6 cores/12 threads), it uses an average of 16% CPU in both cases on average during this simulation, which is still twice what I'd get with only one thread fully utilized (so a single-thread solver). It definitely IS multithreaded, but it's not particularly efficient in optimizing the use of several threads. Efficiently multi-threading a Spice-based solver is tough, as there is necessarily many dependencies between the blocks you can run in parallel, so each thread is likely to be waiting for the result of other threads for a significant portion of its own time.
You can try the "alternate" solver in options, or try a different integration method. You can also play with the tolerance parameters (set them a little higher), so the simulation time will be shorter but at the expense of accuracy. But there is definitely nothing inherently wrong with your setup explaining that. It's just the way LTSpice works. And whereas it performs better than single-threaded, it's not the best out there in terms of performance, but it's free. Cadence Spectre, for instance, is way faster, but it's very expensive. If anyone knows of a free or even not too expensive Spice-based simulator that is faster than LTSpice, I'm interested. (ngspice is definitely not faster in typical use.)