I think you are underestimating the amount of buffering that happens, or can happen.
TCP is a bytestream-oriented protocol.
Typically, you would call a tcp_write() with 80 bytes of data, every millisecond. Most of the time, the 80 bytes would be added to a TCP packet buffer, but when the packet is actually full, it would be sent on the network. With 80 bytes/ms, it takes about 18 writes to fill a 1500 byte packet (well, probably 17, because overhead.) I don't know for sure that this is how lwip works, though...
The tcp_output() function says "send whatever is buffered NOW, rather than waiting for a full packet."
The actual "send packet" SHOULD be quick, depending on your actual hardware. Give the buffer to the EMAC and let it fly. This might not be true with something like a Wiznet or ESP8266, where transferring the data from microcontroller memory to ethernet controller memory can be slow (and can block.) (You haven't said what kind of hardware you're using...) (On the other hand, that sort of device might put the buffer in the controller memory, so it's getting copied out gradually (ever tcp_write()) anyway.)
Q4: What is the round trip time for 1 packet transmission with payload size of 1500 bytes?
Indeterminate, because it depends on ACKing algorithms in the the destination, how many routers are between the endpoints, the bandwidth and delay of each link between the endpoints, and a bunch of other stuff. (Networking is complicated!)
With a 10Mbps ethernet, the actual transmission time for a 1500 byte packet is less than 2ms (~1500*8/10e6, right?)
Note that most TCP implementations don't require a round-trip to finish before they can send the next packet; you can theoretically transmit a full window size worth of data (trivially up to 64kbytes) each round trip delay...