It is hard and likely imprudent to extropolate the 2x figure to all ROM functions. Without extensive testing, it is hard to say definitively.
I think westfw hit the nail on the head. There's a big lookup table that maps those ROM calls to actual code and when the routine you're calling is very quick (e.g., setting a GPIO pin), the percentage time of that lookup overhead is much higher. If you're calling a ROM entrypoint with lots of code behind it, the cost of the lookup will disappear into the noise.
Should you use those ROM calls? Well, the code is already there in the device, so it doesn't cost anything. I used ROM calls for things like clock configuration that I only used once and where I didn't want to figure out all the magic necessary. For other things, like GPIO pins, I wrote my own drivers that went directly to registers. The TM4C port API has some interesting tricks and those aren't available in the ROM.
Sometimes the ROM code is buggy though. You can see this where TI has removed MAP_ equivalents of some calls from newer versions of driverlib.