(Just a little note about the ICE40UP: the 5K version has 5280 LUTs, 120Kbits of EBR and 1024Kbits of single-port RAM, with IOs working up to 250MHz. No need to get obsessed with this example of small FPGA I gave though. This was just a possible choice, not the only considered choice.)
People interested or curious about multi-channel audio can have a look at current standards, such as AES10. But AES10 could still be extended.
As a quick example, say you have a stream of 32 channles of 32-bit, 192kHz samples, that's about 25 MBytes/s. 64 channels, and you're close to the maximum bandwidth. Add some additional data within the stream, and it goes up. Etc.
Another quick example, since I talked about data acquisition. Say you have 8-bit samples @50 MHz sample rate to transmit over a single pair. With some overhead and maybe a few additional packets, and you reach the maximum bandwidth. Can you do it with an iCE40UP? Absolutely. It would take something like 10% of the total LUTs at maximum, actually. So a lot of room to spare for additional features.
For audio applications, getting as many samples as per the first paragraph using an iCE40UP and directly from ADCs would be challenging or impossible, but it could be aggregating input streams with lower bandwidth and transmit it all aggregated on a single pair. Is it doable with an ICE40UP? Absolutely.
Lastly, nobody said you had to use the full bandwidth of the link at all times. The whole idea of portability is that you could reuse the same bus for various applications and various "nodes". A given node could be using a small FPGA and transmitting only a few MBytes of data per s, while some other nodes could be aggregating data and transmit it using more bandwidth, using beefier FPGAs if required, especially if there is additional processing to be done.