Author Topic: Understanding Linear implementation of a round-robin arbiter  (Read 6054 times)

0 Members and 1 Guest are viewing this topic.

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Understanding Linear implementation of a round-robin arbiter
« on: October 01, 2019, 09:56:52 am »
For Efficient microarchitecture for network-on-chip routers , do anyone know how this round-robin arbiter actually works ?


Note: The corresponding verilog codes seem to be located at c_rr_arbiter_base.v and c_rr_arbiter.v

 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #1 on: October 02, 2019, 06:48:07 am »
The top box offers priority to the the next input after the last one granted.

The the rest of the boxes are in a ring, and the first being request locks out the rest of the ring, until it gets back to the one being offered priority where the ring is broken.

Quite nifty.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #2 on: October 02, 2019, 07:17:01 am »
Reset needs to be though through to ensure that one and only one bit is set in the 'who to offer to first' register, preventing the whole thing from latching up.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8024
  • Country: ca
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #3 on: October 02, 2019, 08:03:45 am »
Damn that would have been useful a decade ago.  I did it the hard way...
I made a state machine.  And I also required a 2 tier override priority for 2 specific sources.
« Last Edit: October 02, 2019, 08:59:03 am by BrianHG »
 

Offline RoGeorge

  • Super Contributor
  • ***
  • Posts: 6637
  • Country: ro
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #4 on: October 02, 2019, 08:24:53 am »
do anyone know how this round-robin arbiter actually works ?

You can visualize it as a round table with a moderator, where only one person is allowed to talk, and only for a certain amount of time.  Then, the next person is allowed to talk, and so on.  The moderator allows them to speak one by one, in circles around the table.

What exactly was your question about?  The idea, the schematic, or the HDL implementation?

Later edit

Note:  Using the words "Verilog code" is very misleading, better think about it as a Verilog circuit, or schematic diagram.
« Last Edit: October 02, 2019, 08:31:38 am by RoGeorge »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #5 on: October 02, 2019, 12:17:18 pm »
1) What is the purpose of the OR and AND gates in the second box ?

2) How is the above gate-level schematics different from https://github.com/promach/noc/blob/master/arbiter.v ?


By the way, See waveform of my verilog code implementation




Code: [Select]
// Credit : [url]https://www.reddit.com/r/FPGA/comments/dbr1xs/understanding_linear_implementation_of_a/[/url]

module arbiter2 #(parameter WIDTH = 4) (clk, reset, req, grant);

input clk, reset;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;

// 'grant' is one-hot vector, which means only one client request is granted/given green light to proceed
// note that 'base' is one-hot vector,
// 'base' signal helps round-robin arbiter to decide which 'req' to start servicing
reg [WIDTH-1:0] base;

always @(posedge clk)
begin
if(reset) base <= 1;

else base <= (base[WIDTH-1]) ? 1 : (grant == 0) ? 1 : (grant << 1);
end

wire [WIDTH-1:0] priority_in;
reg [WIDTH-1:0] priority_out;

genvar index;
generate
for(index = 0; index < WIDTH; index = index + 1)
begin
if(index > 0) assign priority_in[index] = base[index] | priority_out[index-1];

else assign priority_in[index] = base[index] | priority_out[WIDTH-1];
end
endgenerate

assign grant = (reset) ? 1 : req & priority_in;

always @(posedge clk) priority_out <= (~req) & priority_in;

`ifdef FORMAL
initial assume(reset);

reg first_clock_had_passed;
initial first_clock_had_passed = 0;

always @(posedge clk) first_clock_had_passed <= 1;
always @(posedge clk) if(first_clock_had_passed && $past(first_clock_had_passed) && $past(first_clock_had_passed, 2) && $past(first_clock_had_passed, 3)) assume(req[WIDTH-1]); // ALL requests ON

// covers the ability to handle requests properly even with ALL requests ON
always @(posedge clk) cover(first_clock_had_passed);
`endif

endmodule
« Last Edit: October 03, 2019, 06:25:29 am by promach »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #6 on: October 03, 2019, 07:19:22 am »
What is wrong with my updated verilog code that resulted in wrong waveform result ?




Code: [Select]
// Credit : [url]https://www.reddit.com/r/FPGA/comments/dbr1xs/understanding_linear_implementation_of_a/[/url]

module arbiter2 #(parameter WIDTH = 4) (clk, reset, req, grant);

input clk, reset;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;

// 'grant' is one-hot vector, which means only one client request is granted/given green light to proceed
// note that 'base' is one-hot vector,
// 'base' signal helps round-robin arbiter to decide which 'req' to start servicing
reg [WIDTH-1:0] base;

always @(posedge clk)
begin
if(reset) base <= 1;

else base <= (base[WIDTH-1]) ? 1 : (grant == 0) ? 1 : (grant << 1);
end

wire [WIDTH-1:0] priority_in;
reg [WIDTH-1:0] priority_out;

genvar index;
generate
for(index = 0; index < WIDTH; index = index + 1)
begin
if(index > 0) assign priority_in[index] = base[index] | priority_out[index-1];

else assign priority_in[index] = base[index] | priority_out[WIDTH-1];
end
endgenerate

assign grant = (reset) ? 1 : req & priority_in;

always @(posedge clk) priority_out <= (~req) & priority_in;

`ifdef FORMAL
initial assume(reset);

reg first_clock_had_passed;
initial first_clock_had_passed = 0;

always @(posedge clk) first_clock_had_passed <= 1;
always @(posedge clk) if(first_clock_had_passed && $past(first_clock_had_passed) && $past(first_clock_had_passed, 2) && $past(first_clock_had_passed, 3)) assume(&req); // ALL requests ON

// covers the ability to handle requests properly even with ALL requests ON
always @(posedge clk) cover(first_clock_had_passed);
`endif

endmodule
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #7 on: October 03, 2019, 08:14:54 am »
The way I read the orignal diagram, this bit feels wrong:

Code: [Select]
always @(posedge clk)
begin
if(reset) base <= 1;

else base <= (base[WIDTH-1]) ? 1 : (grant == 0) ? 1 : (grant << 1);
end

Specifically:

(grant == 0) ? 1 : (grant << 1)

That should be more like

(grant == 0) ? base : { grant[WIDTH-2:0] , base[WIDTH-1] }

Or how ever you do a rotate in Verilog.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #8 on: October 03, 2019, 10:07:10 am »
Warning: I don't do Verilog, so this code might be very poorly written.

So this is that module in the picture, literally turned directly into Verilog:

Code: [Select]
`timescale 1ns / 1ps
// Please credit Mike Field / hamster@snap.net.nz

module arbit(
    input clk,
    input reset,
    input [7:0] request,
    output [7:0] grant
    );

reg [7:0] granting = 8'b0;
reg [7:0] first_offer = 8'b0;
reg [7:0] available = 8'b0;
reg [7:0] request_loop_in = 8'b0;
reg [7:0] request_loop_out = 8'b0;

assign grant = granting;

always @(posedge clk)
  begin
    if(reset) begin
      first_offer  = 1;
    end else begin
      if(granting != 0) begin
        first_offer = { granting[6:0], granting[7]};
      end
    end
  end

// This is yeachy async logic, that contains a loop, bit due to first_offer being
// one-hot the the loop is always broken.
always @(*) request_loop_in   = { request_loop_out[6:0], request_loop_out[7]};
always @(*) available         = request_loop_in | first_offer;
always @(*) granting          = available & request;
always @(*) request_loop_out  = available & ~request;

endmodule

And attached is the simulation that confirms to me that it is working as I expect and understand it should.  It may not be working as you expect it will.

If multiple requests are asserted at the same time it grants to the next highest, and wrapping back to zero.

As I expected, the reset condition is particularly dodgy - should you have an 'X' it will propagate around the loop in the logic and turn the whole thing to custard.
« Last Edit: October 03, 2019, 10:15:40 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #9 on: October 03, 2019, 02:56:16 pm »
Quote
verilator -Wall --lint-only arbit.v
%Warning-UNOPTFLAT: arbit.v:15: Signal unoptimizable: Feedback to clock or circular logic: 'arbit.request_loop_out'
reg [7:0] request_loop_out = 8'b0;
          ^~~~~~~~~~~~~~~~
                    ... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
                    arbit.v:15:      Example path: arbit.request_loop_out
                    arbit.v:36:      Example path: ALWAYS
                    arbit.v:14:      Example path: arbit.request_loop_in
                    arbit.v:37:      Example path: ALWAYS
                    arbit.v:13:      Example path: arbit.available
                    arbit.v:39:      Example path: ALWAYS
                    arbit.v:15:      Example path: arbit.request_loop_out
%Error: Exiting due to 1 warning(s)
%Error: Command Failed /usr/bin/verilator_bin -Wall --lint-only arbit.v

There is circular loop logic in your code...

Besides, can I say that first_offer is equivalent to p[] ?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #10 on: October 03, 2019, 06:23:54 pm »
There is circular loop logic in your code...

That is the whole point. As there is a logic loop in the diagram, so there is a logic loop in the code.

Quote

Besides, can I say that first_offer is equivalent to p[] ?
Sure.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #11 on: October 04, 2019, 02:00:48 am »
Sorry but yosys synthesis tool just does not allow synthesis for code that contains circular loop ...
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #12 on: October 04, 2019, 02:49:48 am »
Sorry but yosys synthesis tool just does not allow synthesis for code that contains circular loop ...
Don't be sorry for that.

From the diagram I could see that it is an interesting, useful piece of logic, that in practice actually works, but it was also apparent that due to the logic loop you would have difficulty to build it using standard FPGA/ASIC tools.

You should have taken the time to read the very next block of text after figure 2.2. in the document you linked to:


Commercially available timing analysis tools are typically unable to properly analyze circuits that contain combinational loops like the one generated by wrapping around the priority signals. This is particularly problematic when using a standard cell design flow, as it prevents synthesis from performing proper gate sizing.


Which is exactly the problem you are facing.
« Last Edit: October 04, 2019, 02:51:27 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #13 on: October 04, 2019, 03:06:13 am »
Quote
due to the logic loop you would have difficulty to build it using standard FPGA/ASIC tools.

Do you have any practical workaround to get around this issue for synthesis tool ?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #14 on: October 04, 2019, 03:15:23 am »
Quote
due to the logic loop you would have difficulty to build it using standard FPGA/ASIC tools.

Do you have any practical workaround to get around this issue for synthesis tool ?

Reading the next sentence after the one quoted on Page 16 might be a very good place to start:

We can avoid the combinational loop by severing the wraparound priority signal and connecting it to a chain of fixed-priority cells F as shown in Figure 2.32

You don't seem very interested in the topic if you can't be bothered reading the relevant parts of the paper you linked to in your original post.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #15 on: October 04, 2019, 04:11:33 am »
Quote
We can avoid the combinational loop by severing the wraparound priority signal and connecting it to a chain of fixed-priority cells F as shown in Figure 2.3

What is actually the purpose of the chain of fixed-priority cells F ?

How does this chain actually help to eliminate the circular loop ?



Update: I have just finished implementing figure 2.3 in verilog code below :



Code: [Select]
// Credit : [url]https://www.reddit.com/r/FPGA/comments/dbr1xs/understanding_linear_implementation_of_a/[/url]

module arbiter2 #(parameter WIDTH = 4) (clk, reset, req, grant);

input clk, reset;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;

// 'grant' is one-hot vector, which means only one client request is granted/given green light to proceed
// note that 'base' is one-hot vector,
// 'base' signal helps round-robin arbiter to decide which 'req' to start servicing
reg [WIDTH-1:0] base;

always @(posedge clk)
begin
if(reset) base <= 1;

else base <= (base[WIDTH-1]) ? 1 : (grant == 0) ? base : { grant[WIDTH-2:0] , base[WIDTH-1] };
end

wire [WIDTH-1:0] priority_in;
wire [(WIDTH << 1)-1:0] priority_out; // the two leftmost significant bit are left unused

wire [WIDTH-1:0] granting = req & priority_in;
wire [WIDTH-1:0] approval; // the most significant bit is left unused, we only have (N-1) block F

genvar index;
generate
for(index = 0; index < WIDTH; index = index + 1)
begin
if(index == WIDTH-1) assign grant[index] = granting[index];

else assign grant[index] = granting[index] | approval[index];


if(index > 0)
begin
assign priority_in[index] = base[index] | priority_out[index-1];
assign approval[index] = approval[index-1] & req[index];
end

else begin
assign priority_in[index] = base[index];
assign approval[index] = priority_out[WIDTH-1] & req[index];
end
end
endgenerate


genvar priority_index;
generate
for(priority_index = 0; priority_index < (WIDTH << 1); priority_index = priority_index + 1)
begin : out_priority

if(priority_index < (WIDTH))
assign priority_out[priority_index] = (~req[priority_index]) & priority_in[priority_index];

else assign priority_out[priority_index] = (~req[priority_index >> 1]) & priority_out[priority_index-1];
end
endgenerate

`ifdef FORMAL
initial assume(reset);

reg first_clock_had_passed;
initial first_clock_had_passed = 0;

always @(posedge clk) first_clock_had_passed <= 1;
always @(posedge clk) if(first_clock_had_passed && $past(first_clock_had_passed) && $past(first_clock_had_passed, 2) && $past(first_clock_had_passed, 3)) assume(&req); // ALL requests ON

// covers the ability to handle requests properly even with ALL requests ON
always @(posedge clk) cover(first_clock_had_passed);
`endif

endmodule
« Last Edit: October 04, 2019, 09:08:46 am by promach »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #16 on: October 04, 2019, 08:07:44 am »
Taking it really slow...

We have four 'things' trying to get granted access to a shared resource - CPUs accessing memory, people accessing a shower, whatever
Call them 'A', 'B', 'C', and 'D'. We want to give them all fair access.

So the following 'round robin' system is chosen:

If 'A' last had access then the priorities will be 'B','C','D' and lowest will be 'A'
If 'B' last had access then the priorities will be 'C','D','A' and lowest will be 'B'
If 'C' last had access then the priorities will be 'D','A','B' and lowest will be 'C'
If 'D' last had access then the priorities will be 'A','B','C' and lowest will be 'D'

To make life simpler, rather than remembering who had the resource last, we will remember who has the highest priority.

That is what figure 2.2 implements - the register in the top block holds who has priority to the resource, and the logic works out the new priorities.

You end up with these patterns - when there are competing requests you take the first in the list, and set the highest priority to be
the next one after the one taken:

Highest priority => complete list of priority, from highest to lowest.
A => ABCD
B => BCDA
C => CDAB
D => DABC

Hopefully you follow me this far.

The problem with the logic was the combinatorial loop caused by the priorities being reshuffled - the one that Verilator can't handle.

Here's how that is solved.

A N+(N-1) stage (7 stages where N = 4), fixed order arbiter is created, with the first N-1 being repeated.
For our four resources the priority will be a never-changing "ABCDABC". What does change is that the first three stages can be disabled depending on who is the priority to the resource:

A => ABCDABC
B => -BCDABC
C => --CDABC
D => ---DABC

You end up with the same effective priority as the round robin, but you have fixed, unchanging priorities in the logic, and no loop.

To make it clearer, the second repetition of a request has been removed (as the first one will always be granted as it has higher priority):

A => ABCD---
B => -BCDA--
C => --CDAB-
D => ---DABC

That is why in figure 2.3 the "request" lines for first N-1 inputs feed two blocks, and there are N-1 OR gates to merge the grant outputs.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: promach

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #17 on: October 04, 2019, 08:44:52 am »
I need some time to digest your previous post just above.

Could you comment on various ways of delay minimization for the gate schematics of Figure 2.3 ?

Note: The PhD thesis author also pointed out another (besides figure 2.3) novel implementation of Fast Arbiters for On-Chip Network Switches
« Last Edit: October 04, 2019, 09:02:44 am by promach »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #18 on: October 04, 2019, 09:48:10 am »
Could you comment on various ways of delay minimization for the gate schematics of Figure 2.3 ?

The opimization seems to be:

Split the N+(N-1) arbiter in (almost) half, so our for a four input arbiter ABCDABC becomes ABCD  and ABC,

Calculate both halves independently - the second half is very simple to calculate as  it doesn't have any dependency on the register that holds what signal was last granted:

    g[4] = A
    g[5] = ~A & B
    g[6] = ~A & ~B & C

And rather than using OR gates to combine g[2:0] with g[6:4], you must then use a 2:1 mux, selecting g[6:4] if the first four blocks of the arbiter haven't asserted an output.

For an 4-input arbiter implemented as shown the critical path from the path though 7 blocks (maybe 14 gates) and an OR gate

For this enhanced version the delay is 4 blocks (maybe 8 gates) and a 2:1 mux.

At least that is the gist of it - something like figure 3 in https://www.researchgate.net/publication/224371772_Fast_arbiters_for_on-chip_network_switches
« Last Edit: October 04, 2019, 10:13:09 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: promach

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #19 on: October 05, 2019, 01:44:23 am »
Literally a rainy day here so did some more experimentation, that will most likely only be of interest to me....  ;)

When implemented in a Spartan6 FPGA, and implemented as per the acyclic diagram, a 4-input round-robin arbiter uses 14 LUTs and 4 flipflops.

Implemented as below uses under half the logic resources (6 LUTs and 2 flipflops) - and is significantly faster too.

Code: [Select]
module arb2(
    input clk,
    input reset,
    input [3:0] request,
    output [3:0] grant
    );


   reg [5:0] lut_result;
   reg [1:0] hi_pri;
   
   assign grant = lut_result[3:0];
   
always @(posedge clk)
   begin
      if(reset) begin
         hi_pri = 2'b0;
      end else begin
         hi_pri = lut_result[5:4];
      end
   end

always @(*)
   begin
     
 case ( { hi_pri, request } )
      // Verify this table yourself - it was manually generated!
      // request[0] priority
      6'b000000: lut_result = 6'b000000;
      6'b000001: lut_result = 6'b010001;
      6'b000010: lut_result = 6'b100010;
      6'b000011: lut_result = 6'b010001;
      6'b000100: lut_result = 6'b110100;
      6'b000101: lut_result = 6'b010001;
      6'b000110: lut_result = 6'b100010;
      6'b000111: lut_result = 6'b010001;
      6'b001000: lut_result = 6'b001000;
      6'b001001: lut_result = 6'b010001;
      6'b001010: lut_result = 6'b100010;
      6'b001011: lut_result = 6'b010001;
      6'b001100: lut_result = 6'b110100;
      6'b001101: lut_result = 6'b010001;
      6'b001110: lut_result = 6'b110100;
      6'b001111: lut_result = 6'b010001;

      // request[1] has priority
      6'b010000: lut_result = 6'b010000;
      6'b010001: lut_result = 6'b100001;
      6'b010010: lut_result = 6'b100010;
      6'b010011: lut_result = 6'b100010;
      6'b010100: lut_result = 6'b110100;
      6'b010101: lut_result = 6'b110100;
      6'b010110: lut_result = 6'b100010;
      6'b010111: lut_result = 6'b100010;
      6'b011000: lut_result = 6'b001000;
      6'b011001: lut_result = 6'b001000;
      6'b011010: lut_result = 6'b100010;
      6'b011011: lut_result = 6'b100010;
      6'b011100: lut_result = 6'b110100;
      6'b011101: lut_result = 6'b110100;
      6'b011110: lut_result = 6'b100010;
      6'b011111: lut_result = 6'b100010;

      // request[2] has priority
      6'b100000: lut_result = 6'b100000;
      6'b100001: lut_result = 6'b010001;
      6'b100010: lut_result = 6'b100010;
      6'b100011: lut_result = 6'b010001;
      6'b100100: lut_result = 6'b110100;
      6'b100101: lut_result = 6'b110100;
      6'b100110: lut_result = 6'b110100;
      6'b100111: lut_result = 6'b110100;
      6'b101000: lut_result = 6'b001000;
      6'b101001: lut_result = 6'b001000;
      6'b101010: lut_result = 6'b001000;
      6'b101011: lut_result = 6'b001000;
      6'b101100: lut_result = 6'b110100;
      6'b101101: lut_result = 6'b110100;
      6'b101110: lut_result = 6'b110100;
      6'b101111: lut_result = 6'b110100;

      // request[3] has priority
      6'b110000: lut_result = 6'b110000;
      6'b110001: lut_result = 6'b010001;
      6'b110010: lut_result = 6'b100010;
      6'b110011: lut_result = 6'b010001;
      6'b110100: lut_result = 6'b110100;
      6'b110101: lut_result = 6'b010001;
      6'b110110: lut_result = 6'b100010;
      6'b110111: lut_result = 6'b010001;
      6'b111000: lut_result = 6'b001000;
      6'b111001: lut_result = 6'b001000;
      6'b111010: lut_result = 6'b001000;
      6'b111011: lut_result = 6'b001000;
      6'b111100: lut_result = 6'b001000;
      6'b111101: lut_result = 6'b001000;
      6'b111110: lut_result = 6'b001000;
      6'b111111: lut_result = 6'b001000;

      default  : lut_result = 6'b000000;
   endcase
                 
   end
endmodule

Of course it doesn't scale well to wider arbiters, but is a perfect match for a LUT6 FPGA
« Last Edit: October 05, 2019, 01:49:24 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #20 on: October 05, 2019, 02:45:19 am »
I really need some more time to understand your explanation on how the round-robin arbiter in the PhD thesis works.

By the way, you were implementing it in pure lookup table which also resulted in the same resource usage as my code in post above.

Quote
5. Printing statistics.

=== arbiter2 ===

   Number of wires:                 17
   Number of wire bits:             32
   Number of public wires:           7
   Number of public wire bits:      22
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                 18
     SB_DFFESR                       2
     SB_DFFESS                       1
     SB_DFFSR                        1
     SB_LUT4                        14
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #21 on: October 05, 2019, 03:19:12 am »
Quote
A N+(N-1) stage (7 stages where N = 4), fixed order arbiter is created, with the first N-1 being repeated.
For our four resources the priority will be a never-changing "ABCDABC". What does change is that the first three stages can be disabled depending on who is the priority to the resource:

A => ABCDABC
B => -BCDABC
C => --CDABC
D => ---DABC

You end up with the same effective priority as the round robin, but you have fixed, unchanging priorities in the logic, and no loop.

To make it clearer, the second repetition of a request has been removed (as the first one will always be granted as it has higher priority):

A => ABCD---
B => -BCDA--
C => --CDAB-
D => ---DABC

I do not understand why disabling the first three stages ?
« Last Edit: October 05, 2019, 04:22:38 am by promach »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #22 on: October 05, 2019, 04:20:45 am »
Quote
A N+(N-1) stage (7 stages where N = 4), fixed order arbiter is created, with the first N-1 being repeated.
For our four resources the priority will be a never-changing "ABCDABC". What does change is that the first three stages can be disabled depending on who is the priority to the resource:

A => ABCDABC
B => -BCDABC
C => --CDABC
D => ---DABC

You end up with the same effective priority as the round robin, but you have fixed, unchanging priorities in the logic, and no loop.

To make it clearer, the second repetition of a request has been removed (as the first one will always be granted as it has higher priority):

A => ABCD---
B => -BCDA--
C => --CDAB-
D => ---DABC

I do not understand why disabling the first three stages ?

If it has just granted a request to A, you now want 'B" to have the highest effective priority, so you disable the first stage, giving B the highest priority.

If it has granted a request to B, you now want 'C" to have the highest effective priority, so you disable the first two stages, giving C the highest priority.

If it has granted a request to C, you now want 'D" to have the highest effective priority, so you disable the first three stages, giving D the highest priority.

If it has granted a request to D, you now want 'A" to have the highest effective priority, so you don't disable any, giving A the highest priority.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #23 on: October 05, 2019, 04:22:49 am »
Quote
That is why in figure 2.3 the "request" lines for first N-1 inputs feed two blocks, and there are N-1 OR gates to merge the grant outputs.

And how does such disabling helps to eliminate the circular loop in figure 2.2 ?


Quote
Calculate both halves independently - the second half is very simple to calculate as  it doesn't have any dependency on the register that holds what signal was last granted:

    g[4] = A
    g[5] = ~A & B
    g[6] = ~A & ~B & C


No, note that the first block F requires Cn output from the Nth block R. Did you see that ?
« Last Edit: October 05, 2019, 04:54:35 am by promach »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #24 on: October 05, 2019, 05:57:13 am »
FIgure 2.2 has circular loops making it hard for tools to implement.

That is why the scheme in figure 2.3 is used - to get rid of the loops.

So how do you get rid of the loops in figure 2.2? You implement the functionally equivilent design that is in shown in figure 2.3!

(have I been secretly become trapped into a piece of performance art? Some avant garde trolling?)




Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #25 on: October 05, 2019, 06:02:36 am »
Sorry , I will need to study figure 2.3 on my own then.

I am not doing any trolling here. Sorry again.
I am actually doing a personal research study on round-robin arbiter, hence what you meant by performance art.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #26 on: October 06, 2019, 10:16:23 am »
Sorry , I will need to study figure 2.3 on my own then.

I am not doing any trolling here. Sorry again.
I am actually doing a personal research study on round-robin arbiter, hence what you meant by performance art.

Sorry if I was a bit short - just wanted to say thanks because it got me to do my first FPGA project for a while, a 4-input Arbiter with two different modes:

http://hamsterworks.co.nz/mediawiki/index.php/Arbiter

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #27 on: October 22, 2019, 02:46:57 pm »
I have just formally verified the arbiter architecture (Figure 2.3) given in the thesis.

TODO :
1. Add more assert() formal coding for round-robin capability check
2. Investigate ways to reduce tcomb delay as described at the end of section 2.3 of the thesis
3. Includes the core ideas from Fast Arbiters for On-Chip Network Switches

Code: [Select]
// Credit : [url]https://www.eevblog.com/forum/fpga/understanding-linear-implementation-of-a-round-robin-arbiter/[/url]

module arbiter2 #(parameter WIDTH = 4) (clk, reset, req, grant);

input clk, reset;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;

// 'grant' is one-hot vector, which means only one client request is granted/given green light to proceed
// note that 'base' is one-hot vector,
// 'base' signal helps round-robin arbiter to decide which 'req' to start servicing
reg [WIDTH-1:0] base;

always @(posedge clk)
begin
if(reset) base <= 1;

else base <= (grant[WIDTH-1]) ? 1 : (grant == 0) ? base : ( grant << 1 );
end

wire [WIDTH-1:0] priority_in;
wire [(WIDTH << 1)-1:0] priority_out; // the two leftmost significant bit are left unused

wire [WIDTH-1:0] granting = req & priority_in;
wire [WIDTH-2:0] approval; // we only have (WIDTH-1) block F

genvar index;
generate
for(index = 0; index < WIDTH; index = index + 1)
begin
if(index == WIDTH-1) assign grant[index] = (reset) ? 0 : granting[index];

else assign grant[index] = (reset) ? 0 : ( granting[index] | approval[index] );


if(index < (WIDTH-1)) assign approval[index] = ( priority_out[index+WIDTH-1] & req[index] );

if(index > 0) assign priority_in[index] = ( base[index] | priority_out[index-1] );

else assign priority_in[index] = base[index];
end
endgenerate


genvar priority_index;
generate
for(priority_index = 0; priority_index < (WIDTH << 1); priority_index = priority_index + 1)
begin : out_priority

if(priority_index < (WIDTH))
assign priority_out[priority_index] = (~req[priority_index]) & priority_in[priority_index];

else assign priority_out[priority_index] = (~req[priority_index-WIDTH]) & priority_out[priority_index-1];
end
endgenerate


`ifdef FORMAL

initial assume(reset);
initial assume(req == 0);  // only enable this assume() to test the cover() at line 100 properly

genvar grant_num;

generate
for(grant_num = 0; grant_num < WIDTH; grant_num = grant_num + 1)

always @(*) cover(first_clock_had_passed && grant[grant_num]);  // covers grants to each of the clients' request

endgenerate


reg [WIDTH-1:0] req_previous;
always @(posedge clk) req_previous <= req;

always @(posedge clk) cover(!$past(reset) && (grant == 0)); // covers the ability to go to an idle state

// covers the ability to handle requests properly even with ALL requests ON
always @(posedge clk) cover((&$past(req_previous)) && (&$past(req)) && (&req) && first_clock_had_passed && $past(first_clock_had_passed) && ((grant & $past(req)) == grant));


reg [WIDTH-1:0] grant_previous;
always @(posedge clk) grant_previous <= grant;

always @(posedge clk) cover(grant != grant_previous); // covers the ability to switch grants to any other requests

always @(posedge clk) cover(first_clock_had_passed && $past(first_clock_had_passed) && (&req) && (req_previous == 4'b1100) && ($past(req_previous) == 4'b1011)); // covers round-robin ability
`endif

`ifdef FORMAL

reg first_clock_had_passed;
initial first_clock_had_passed = 0;

always @(posedge clk) first_clock_had_passed <= 1;

// [url]https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2[/url]
wire grant_is_one_hot = (grant != 0) && ((grant & (grant - 1)) == 0);
wire base_is_one_hot = (base != 0) && ((base & (base - 1)) == 0);

// assertions for round-robin capability
always @(*)
begin
if(reset) assert(grant == 0);

else begin
if (|req) assert(grant_is_one_hot);

else assert(grant == 0);
end
end

always @(posedge clk)
begin
if(first_clock_had_passed)
begin
// starts round-robin arbiter with req #0 getting prioritized first
if($past(reset)) assert(base == 1);

// 'grant' is a one-hot signal, but 'req' is not a one-hot signal
// 'base' is a one-hot signal which rotates
//  after the corresponding 'req' had been granted/given permission to proceed)
//  Rotation wraps around upon reaching MSB

else begin
assert(base == $past(grant[WIDTH-1]) ? 1 : ($past(grant) == 0) ? $past(base) : ($past(grant) << 1) );
assert(base_is_one_hot);
end
end
end

genvar f_index;
generate

for(f_index = 0; f_index < WIDTH; f_index = f_index + 1)
begin
always @(*)
begin
if(reset) assert( grant[f_index] == 0 );

else begin
if(f_index == WIDTH-1) assert( grant[f_index] == granting[f_index] );

else assert( grant[f_index] == ( granting[f_index] | approval[f_index] ) );
end

if(f_index < (WIDTH-1)) assert( approval[f_index] == ( priority_out[f_index+WIDTH-1] & req[f_index] ));

if(f_index > 0) assert( priority_in[f_index] == (base[f_index] | priority_out[f_index-1] ));

else assert( priority_in[f_index] == base[f_index] );
end
end

endgenerate


genvar f_priority_index;
generate

for(f_priority_index = 0; f_priority_index < (WIDTH << 1); f_priority_index = f_priority_index + 1)
begin : out_priority
always @(*)
begin
if(f_priority_index < (WIDTH))
assert( priority_out[f_priority_index] ==
((~req[f_priority_index]) & priority_in[f_priority_index]) );

else assert( priority_out[f_priority_index] ==
((~req[f_priority_index-WIDTH]) & priority_out[f_priority_index-1]) );
end
end

endgenerate

`endif

endmodule
« Last Edit: March 01, 2020, 04:28:37 pm by promach »
 

Offline promachTopic starter

  • Frequent Contributor
  • **
  • Posts: 878
  • Country: us
Re: Understanding Linear implementation of a round-robin arbiter
« Reply #28 on: March 13, 2020, 03:33:52 am »
For Fast Arbiters for On-Chip Network Switches , what do you guys understand by GP and GR ?





« Last Edit: May 08, 2020, 03:57:37 pm by promach »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf