presumably designed for memory efficiency and simplicity above anything else including performance.
That's
exactly what it is. Considering the kind of tasks Arduino is targeted at; even a slow ųC is usually ample fast enough , the Ethernet library already burns a chunk of memory, and a
single Ethernet packet itself can be quite a bit larger than the
total RAM available on many of the ųC's being used… what size buffer do you expect them to allocate for you, so that you can plod along your merry way and don't have to think about what you're doing? As it is, one of my projects has to use an older version of the standard library because the hundred bytes or so extra bytes used by the newer ones is too much.
And further, considering an Arduino generally isn't sending over SSL, you really probably shouldn't be using it anywhere except a LAN anyhow, and those Ethernet chips are likely slow enough you don't have a hope of saturating the link either. So who cares if it sends every character as a separate packet, if it saves having to burn yet more of your super valuable RAM for a buffer?
Basically, it's assuming you either know what you're doing, or aren't using it in an environment where sending a single character as a packet is going to be a problem. So, doing it the way it does, leaves the choice up to you:
- Minimise memory by sending each individual piece of data as a separate packet (no buffering needed).
- Minimise bandwidth by buffering the data yourself, and sending the whole chunk as a single packet.
The thing with F-strings is… annoying. But makes sense; the stream print command isn't buffering, it just takes the data it's given, and sends it as a packet. And again to avoid buffering, the println() command is basically just two print()'s internally, one for the data, and one for the '\n'. You simply need to know these things, and code accordingly.
There are also buffer writers right there in the system that will collect your prints for you into a buffer you provide (or use String, which does it on the heap)… This stuff all exists already. It all makes a lot of perfectly good sense when you stop and think it through — though, not that that's ever stopped anything from going horribly wrong. Point is, criticising it before you understand it…

Though I'd have thought the networking chip has it's own outbound buffer… but probably also has some tight limits that would be very hard to express to novice users of the library, so they simply opted for one-packet-per-write, and let you buffer it yourself if needed.
Personally, I hadn't come across that issue because I almost always prefer the C++ << streaming style into locally allocated buffers — it just feels like a more natural style to me, and compiles down to the same or better anyhow. So I almost always allocate a local buffer, hand it to an instance of a buffer writer, and then when it's all done, just send the completed buffer to wherever it needs to go. Avoids the whole issue that way.