Just to reiterate;
- All streaming in a consumer playback environment is via TCP/IP
- The IP traffic is, today, exclusively TCP, more specifically HTTP. Because nothing else will function given the proliferation of middle boxes, NAT gateways, and other packet-destroying paraphernalia that make up the end-user "internet" today.
- Sometimes it might be SMB. Still TCP.
- In both cases, it is essentially a checksummed (multiple layers) file transfer. It is not a bitstream as would be found via AES3 or S/P DIF. (those both actually transfer samples, in AES3 32-bit ones, that are alternatingly left or right, and have extra data in the top 8 bits, given a 24-bit audio level value.)
- It being a file, it is always transfered with liberal buffering, implying that there is ample time for getting the samples out of the file and into the playback application.
- (How can we be certain there's buffering? Because doing it without buffers is impossible, and small buffers are much harder. I know, because in my application you can't buffer much at all when it's really live, and people who made the signal will hear it (stage monitors) after it's been shipped over your network! We even have to use RTP over UDP, and we do a bunch of magic with clocking things, using PTP et al.)
- In the case of ID3 tags, they are at the end of the file which means that if your player can show the title it has the entire file or can do random reads.
- So, the player, before playing, gets most of the file, in chunks (HLS) or disk blocks (SMB) or as a more traditional FTP-style transfer.
- The file has a header, in which there's a sample rate indicated.
- The local oscillator governing the D/A converter is responsible for putting samples out in the right rate, as read from the header. Not the network, not the NAS, not the web server handing the HLS stream out. They're just shipping bits, with buffering.
UDP and TCP are the common transport protocols (when going by OSI)
TCP is used where time is kind off irrelevant but integrity is important (File transfers, etc.)
UDP is used when time is more important than integrity (Streaming, online games, etc.)
HTTP (and SMB or FTP, or any protocol aware of the concept of "file") is Application layer. Way different beast and has nothing to do with shuffling 1's and 0's about.
All the compensation for jumbled packets happens in the displaying software on the receiving end, NOT in the transport! Else you would have to update your router (not modem, not switch) for every new file format.
SPDIF (no matter if home or pro format) is VERY time sensitive. As such, minimal (if any) buffering. A "broken" packet or a flipped bit does not matter much (even if every 10th packet goes missing completely), after the DAC and its output filter are done with the bit-stream, it will be difficult to measure and impossible to hear.
Additionally, even when the receiver had time to detect an error, there is no way to tell the sender to re-transmit.
There is also never a full file transfer (there is not even the concept of file). Just some info, left/right select and then the actual audio payload. Else you would have to wait considerable time each time you change the song (say 8MB mp3 @ 15Mbit/s ~ 4 Seconds).