RS485. You can get tranceivers that do 50mbps, can handle 200 nodes per bus and considering that most M0 cores max out at ~50Mhz, thats about as fast as you will go.
Actually I intend to do something quite non-standard and run multiple independent UART/RS485 buses simultaneously with a single UART module, switching between them with a multiplexer to receive/transmit on multiple independent channels. I will probably run a separate line along with each interconnect that is set high when there is traffic waiting. The MCU will detect the low-high transition and run an interrupt which will switch the multiplexer to connect its UART to the appropriate line to receive incoming data. A return line will be set high to indicate that it is ready to receive the pending transmission and the sending MCU will transmit. In this case all UART buses between modules will only have the two nodes on them and can run at full speed without affecting non-local nodes (beyond servicing latency time) - I will just be introducing the ability to put the connection in 'pause' mode while another line is serviced. I'm not entirely sure this will work out, but I'm looking into it. It will certainly add latency when forwarding traffic across multiple modules, but from my initial calculations this shouldn't be particularly bad in the Mbps range. There are obviously lots of potential issues with this approach, but at this stage it seems to be a good tradeoff that will allow high throughput across the whole network regardless of how big it gets, whilst having reasonable power and latency attributes.
If you are going to go to the trouble to add an external IC (the multiplexor), then I'd suggest better to add a cheap Lattice FPGA or CPLD instead; these are available for a few dollars. They have enough I/Os that you can get your 6 serial channels directly from the hardware, and it can make routing decisions in hardware for packets that don't address the local node. If you don't know FPGAs and HDL, then at least put a second microcontroller (like the CAN controller Richard described). You really don't want to be using the main processor for muxing data and routing packets among 6 neighbors. Pulling data into the main CPU (every byte) only to send it out again will possibly consume a major portion of your processing time and the node will become useless, unless you reduce the communication rate and then the network will become useless. Better to offload that work, either in a separate MCU with 2 or 3 UARTS + MUX (2 extra chips) or a CPLD/FPGA with 6 serial ports in hardware and some routing logic.
If you do decide to use a CPLD or FPGA , do not even think of it like a UART. instead, I'd make a simple synchronous serial port (synchronous, not asynchronous), duplicate it 6 times and add the routing logic in hardware based on bits in the frames. This is extremely easy to do. Then use an on-board SPI channel to the main CPU for any control and data that is destined for the local node. Data not destined for the local node would be transferred internally with 1-bit time latency (after routing) between two I/O ports on the CPLD/FPGA. This kind of synchronous serial port with frame level routing is dead easy to do in hardware. Via a shift register, you buffer enough bits of the frame to determine routing (which should be in the first few bits), latch it, do a lookup in the hardware routing table, and transfer all bits of the frame to the next port until the end of frame; if the frame is for the local node, you shift-in the whole frame, latch it to memory, then interrupt the main CPU. Other bits of combinatorial logic and register logic is needed for control functions, but it's easy.
Synchronous frame level board to board and board to off-board protocols like Serial RapidIO, Infiniband, Aurora, etc. exist for this already, and there are (expensive!) hardware routing chips available for this too. They are complete SOCs with user-level protocols and specs for up to 10+ Gbps multi-lane networking, but I don't think you need all that. You can still look to these protocols for guidance, but if all you are looking for is 1Mbit/s point-to-point and can do your own simple framing protocol, then you can get away with a simple roll-your-own hardware serial solution. You can add virtual circuit switching later if you wanted Node X to connect directly to Node Y in your mesh.
I'm writing this just to put forth another possible way for you to think about it.
And if you've never used an FPGA (and thus never used an HDL like VHDL or Verilog), then it's perhaps a simple enough reason to learn it