I did not get into the corner details of the articles, but I think this latch insertion is needed only for SCAN operations (e.g. during ATPG vectro scan-in/out). In such scan modes, the flip-flops are daisy-chained into a large shift register-like structure (reffered to as scan chain).
Each scan FF has an input mux selecting normal data (comming from combo logic) or the output of the previous scan FF in scan mode.
The problem is, that during scan mode, the test clock (which clocks the scan chain) is common, and this means timing paths now exist between domains which probably are irrelevant in normal (not scan) operation of the logic. These paths are likely to create setup/hold violations and tools will try to add balancing clock buffers, for instance.
From what I understand, this article suggests that a latch is used instead to architecturally avoid timing path optimization.