I have embedded devices which use Wiznet chips for TCP/UDP/IP stack. I use the following simple between-the-devices communication scheme, using UDP:
A leader periodically sends leader messages to broadcast address, say typically 192.168.1.255 let's say port 7777.
The follower(s) see the message, and store the leader's IP address (in this case, let's say 192.168.1.123).
After getting to know leader's IP address, followers report back whenever they see fit, by sending normal unicast UDP packets (e.g., to 192.168.1.123 port 7777 again).
Actual behavior on hardware / Wiznet is simple, understandable and works. Everybody listens to port 7777, Wiznet automagically seems to ignore "own" packets, followers get the broadcast packets, and leader gets the reply packets sent to its address.
But then we compile the firmware for simulation and testing on PC. It's a pretty low-effort thing with minimum amount of PC specific wrapper code; it's not a shipped product per se, it just needs to work. Quite obviously I don't want to use a rack of computers with actual LAN between them, but simply run multiple copies of the "pcfw", each their own OS process, and this "simplification" is where it all falls apart.
The pcfw has one notable difference, namely just use 127.0.0.1 instead of broadcasting:
leader: send UDP packets to 127.0.0.1:7777
follower: listen to UDP port 7777, store sender's (leader's) ip address (always works out as 127.0.0.1): send own stuff to 127.0.0.1:7777
leader: listen to UDP port 7777 to see follower's responses.
UDP payload itself contains unique "device" ids so that seeing one's own packets is no problem, they are trivial to ignore.
The program simply:
1) calls socket(AF_INET, SOCK_DGRAM, 0)
2) calls setsockopt setting SO_REUSEADDR
3) optionally, calls setsockopt setting SO_REUSEPORT in attempt to make this work. No difference whatsoever.
4) calls bind() with sin_port(htons(7777)) and sin_addr.s_addr = htonl(INADDR_ANY)
5) periodically calls sendto(), sending to 127.0.0.1
6) periodically calls recvfrom to see if incoming packets are in OS buffer.
Now start first copy of the program, and it will receive its own messages (what it sends). Fair enough, this is expected behavior, we are sending to localhost after all.
Then start the second copy of the program. It starts receiving the packets from the first copy, great. But at the same time, the first copy stops receiving anything, including its own messages. If I then stop and restart the first copy, it start receiving messages from the second copy (and itself), but the second copy stops receiving anything. In other words: every process can send, but only the one which started last can receive at all
So it seems SO_REUSEADDR and SO_REUSEPORT just prevent bind() from erroring out but the OS redirects the packets to only one process at the time. Internet tells me that SO_REUSEPORT in linux is by design for load balancing, i.e. dividing the messages between the processes. This is no good for this use case.
And I'm reluctant of changing the design itself (e.g. to use two different port numbers) because it is simple and Just Works on real hardware. You can of course convince me of otherwise...
I'm sure there must be a - hopefully simple! - way for multiple processes to receive all of the messages because obviously I can run:
sudo tcpdump -n udp port 7777 -i any
and receive the messages while my own software still keeps receiving them. I want similar functionality, just monitor the UDP packets without destroying them; i.e., own copy for every socket.
I usually don't like asking questions but I have been googling this for more than 10 hours total now and down quite many rabbit holes so what the heck.