If you have ever worked with AXI interfaces you will know that the basic building block in an AXI Interface is the pipe, which consists of 3 signals
data,
ready, and
valid. An AXI streaming interface is usually simply a single pipe, and an AXI memory mapped interface combines several address and data pipes to transfer read/write commands and responses.
The basic protocol of an AXI pipe is simple. When the source has data to send, it asserts
valid and sets
data. A transfer occurs when the sink also asserts
ready. The protocol advancement condition is thus (
ready and
valid).
I have been implementing AXI based peripherals and stream processors for some time now and have the following observation. In a larger design where a pipe passes through many modules or stream processors, it is usually desirable to register the
ready signal at many points along the stream in order to meet timing. Each stream processor may have their own logic to decide when to accept data, but it must always pause data flow if the downstream
ready signal is 0, which naturally means the
ready signal can't be delayed. If the
ready signal is delayed, the stream processor will continue on for one more cycle after the sink de-asserts
ready, which means data will be lost.
The way around this is a shallow FIFO, also called a "register slice" or "skid buffer". The FIFO is often 2 or 4 entries deep and allows buffering of incoming data when downstream is stalled but we are already committed to receiving data from upstream (because we still have our
ready signal asserted.)
The end result is every time you want to register a ready signal, you must have a skid buffer. This means in a larger pipeline you will have many skid buffers spread throughout the design. Each skid buffer also introduces extra cycles of data delay because the way the FIFO is implemented is usually with registers and a MUX, and another register is desirable after the MUX to improve timing.
That's when I realized that a pattern often emerges when I'm e.g. implementing my own skid buffer or AXI stream arbiter. Often what I end up doing is converting an AXI pipe into a different kind of pipe, with signals
data,
strobe, and
ready, with
strobe serving a similar function as
valid. This interface I will call OXI from now on, and has the following properties:
- The bus advancement condition is no longer ready and valid, but simply strobe. In other words whenever strobe is asserted the sink must accept a word of data and isn't allowed to reject it by de-asserting ready.
- ready controls data flow in the future. Asserting ready for one cycle will allow one word through some point in the future. ready must be asserted by a sink without waiting for strobe, unlike in AXI which is the other way around.
- An OXI source has an attribute "overdelivery", which is the maximum flow control delay (from ready to data/strobe).
- An OXI sink has an attribute "overacceptance", which is the maximum number of words it can continue to accept after it deasserts ready.
- A source can be connected to a sink only if the sink's "overacceptance" is >= the source's "overdelivery".
- Registers can be inserted in the ready or data/strobe path at will without requiring any skid buffer. However each register you insert will increase the "overdelivery" of the source by 1.
The end result is you can get away with one bigger FIFO/skid buffer at the very end of the pipeline (the ultimate sink) rather than many small FIFOs scattered throughout. The FIFO deasserts its
ready signal when it is M words short of full, hence it has "overacceptance" of M. It then becomes possible to add pipeline registers to both the
data/strobe and
ready path at any point of the pipeline as needed.
The advantages of this protocol are less resource usage (especially in FPGAs where ram is more area efficient than many small FIFOs implemented with registers and MUXs), and less total latency (because every skid buffer needs some extra pipeline registers). It isn't meant to replace AXI but rather supplement it. It's very trivial to convert an AXI source to an OXI source (one AND gate and optional registering), and easy to adapt an OXI source to an AXI sink (a FIFO is one way). I might, for example, convert an AXI stream to OXI, put it through a long pipeline, and convert to AXI again by adding the FIFO required for overacceptance.
Has a protocol like this been devised before and is there a good argument for immediate-handshake protocols like the way it is done in AXI? It seems to me that it's far easier to get timing closure with a "delay acceptable" protocol like what I just described while introducing minimal latency, so what is the reason the industry has gone with an immediate handshake approach?