Definitely use the vendor macro if available. They will provide the most efficient implementation for the specific platform.
I'm not sure how other tools work exactly, but in Quartus for instance, synchronizer chains for CDC between unrelated clock domains are automatically detected and protected from optimization, and they automatically creates a false path constraint for timing analysis. However, if the launch and latch clocks are the same or have a fixed phase relationship it will just treat it as standard logic. The synthesizer has no way to recognize this as a correct async FIFO implementation that never reads and writes the same location at the same time, so it has to check timing as if you could. The solution is to either add the proper timing directives or to use the platform macros and it is definitely better to use the macros when possible.